EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Named Entity Recognition System for Hindi Language: A Hybrid Approach

Shilpi Srivastava, Mukund Sanglikar, D.C Kothari

Pages - 10 - 23 | Revised - 01-07-2011 | Published - 05-08-2011

Published in International Journal of Computational Linguistics (IJCL)

Volume - 2 Issue - 1 | Publication Date - July / August 2011 Table of Contents

MORE INFORMATION

References | Cited By (13) | Abstracting & Indexing

KEYWORDS

Machine Learning, MaxEnt, CRF, Rule-base, Voting, Named Entity Recognition

ABSTRACT

Named Entity Recognition (NER) is a major early step in Natural Language Processing (NLP) tasks like machine translation, text to speech synthesis, natural language understanding etc. It seeks to classify words which represent names in text into predefined categories like location, person-name, organization, date, time etc. In this paper we have used a combination of machine learning and Rule based approaches to classify named entities. The paper introduces a hybrid approach for NER. We have experimented with Statistical approaches like Conditional Random Fields (CRF) & Maximum Entropy (MaxEnt) and Rule based approach based on the set of linguistic rules. Linguistic approach plays a vital role in overcoming the limitations of statistical models for morphologically rich language like Hindi. Also the system uses voting method to improve the performance of the NER system. Keywords: NER, MaxEnt, CRF, Rule base, Voting, Hybrid approach ________________________________________

CITED BY (13)

1	Hinkova, A., Bubnik, Z., & Kadlec, P. (2014). Chemical Composition of Sugar and Confectionery Products. In Handbook of Food Chemistry (pp. 1-34). Springer Berlin Heidelberg.

2	Seedah, D. P. K. (2014). Retrieving information from heterogeneous freight data sources to answer natural language queries (Doctoral dissertation).

3	Morwal, S., Chopra, D., & Purohit, G. N. named entity recognition in natural languages using transliteration.

4	Morwal, S., & Chopra, D. (2013). nerhmm: A Tool For Named Entity Recognition based on Hidden Markov Model. International Journal on Natural Language Computing (IJNLC), 2, 43-49.

5	Chopra, D., & Morwal, S. (2013). Named Entity Recognition in English Using Hidden Markov Model. International Journal.

6	Jimmy, L., & Kaur, D. (2013). Named entity recognition in Manipuri: a hybrid approach. In Language Processing and Knowledge in the Web (pp. 104-110). Springer Berlin Heidelberg.

7	Chopra, D., Morwal, S., & Purohit, G. N. hidden markov model based named entity recognition tool.

8	Morwal, S., Jahan, N., & Chopra, D. (2012). Named entity recognition using hidden Markov model (HMM). International Journal on Natural Language Computing (IJNLC), 1(4).

9	Chopra, D., Jahan, N., & Morwal, S. (2012). Hindi named entity recognition by aggregating rule based heuristics and hidden markov model. International Journal of Information Sciences and Techniques (IJIST) Vol, 2.

10	Jahan, N., Morwal, S., & Chopra, D. (2012). Named entity recognition in indian languages using gazetteer method and hidden markov model: A hybrid approach. IJCSET, March.

11	Jahangir, F., Anwar, W., Bajwa, U. I., & Wang, X. (2012, December). N-gram and gazetteer list based named entity recognition for urdu: A scarce resourced language. In Proceedings of the 10th Workshop on Asian Language Resources (pp. 95-104).

12	Chopra, D., & Morwal, S. (2012). Named Entity Recognition in Punjabi Using Hidden Markov Model. International Journal of Computer Science & Engineering Technology (IJCSET), 3(12).

13	Sathyanarayana, S. A. S. A Hybrid approach for Named Entity Recognition, Classification and Extraction (NERCE) in Kannada Documents.

ABSTRACTING & INDEXING

1	Google Scholar

2	CiteSeerX

3	refSeek

4	Scribd

5	SlideShare

6	PdfSR

REFERENCES

A. Borthwick, "A Maximum Entropy Approach to Named Entity Recognition", In NY University, p. 1-4, 18-24, PHD Thesis, September 1999

Akshar Bharti, Rajeev Sangal and Dipti M Sharma, "Shakti Analyzer: SSF Representation",IIIT Hyderabad, p. 3-5, 2006

Anil Kumar Singh, "Named Entity Recognition for South and South East Asian Languages:Taking Stock", p. 5-7, In IJCNLP 2008.

Asif Ekbal et. al. “Language Independent Named Entity Recognition in Indian Languages”.IJCNLP, 2008.

Charles L. Wayne. 1991., “A snapshot of two DARPA speech and Natural Language Programs” in the proceedings of workshop on Speech and Natural Languages, pages 103-404, Pacific Grove, California. Association for Computational Linguistics.

Cucerzan S. and Yarowsky D. 1999. Language independent named entity recognition combining morphological and contextual evidence. In: Proceedings of the Joint SIGDAT Conference on EMNLP and VLC 1999, pp. 90-99.

Daniel M. Bikel, Scott Miller, Richard Schwartz and Ralph Weischedel. 1997 “Nymble: a high performance learning name-finder” in the proceedings of the fifth conference on Applied natural language processing, pages 194-201, San Francisco, CA, USA Morgan Kaufmann Publishers Inc.

Gali, K., Sharma, H., Vaidya, A., Shisthla, P., Sharma, D.M.: Aggregrating Machine Learning and Rule-based Heuristics for Named Entity Recognition. In: Proceedings of the IJCNLP-08Workshop on NER for South and South East Asian Languages. (2008) 25–32

Hanna M. Wallach, "Conditional Random Fields: An Introduction”, Technical Report,University of Pennsylvania. 4-5, 2004.

Hideki Isozaki. 2001. “Japanese named entity recognition based on a simple rule generator and decision tree learning” in the proceedings of the Association for Computational Linguistics, pages 306-313. India.

Hindi Wordnet, Source: http://www.cfilt.iitb.ac.in/wordnet/webhwn/

IJCNLP-08 Workshop data set, Source: http://ltrc.iiit.net/ner-ssea-08/index.cgi?topic=5

Lafferty, J., McCallum, A., Pereira, F., "Conditional random fields: Probabilistic models for segmenting and labeling sequence data", In: Proc. 18th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, p. 1-5, 2001

Lawrence R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", In Proceedings of the IEEE, 77 (2), p. 257-286,February 1989

Li W. and McCallum A. 2003. Rapid Development of Hindi Named Entity Recognition using Conditional Random Fields and Feature Induction. In: ACM Transactions on Asian Language Information Processing (TALIP), 2(3): 290–294.

McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit."http://mallet.cs.umass.edu. 2002.

Mikheev A, Grover C. and Moens M. 1998. Description of the LTG system used for MUC-7.In Proceedings of the Seventh Message Understanding Conference.

Prasad Pingli et al. “A Hybrid Approach for Named Entity Recognition in Indian Languages”.IJCNLP, 2008.

R. Grishman, Beth Sundheim. 1996. “Message Understanding Conference-6: A Brief History” in the proceedings of the 16th International Conference on Computational Linguistics (COLING), pages 466-471, Center for Sprogteknologi, Copenhagen, Denmark.

R. Grishman. 1995. “The NYU system for MUC-6 or Where’s the Syntax” in the proceedings of Sixth Message Understanding Conference (MUC-6) , pages 167-195, Fairfax, Virginia.

Shilpi Srivastava, Siby Abraham, Mukund Sanglikar, D C Kothari: “Role of Ensemble Learning in Identifying Hindi Names”, International Journal of Computer Science and Applications, ISSN No. 0974-0767.

Shilpi Srivastava, Siby Abraham, Mukund Sanglikar: “Hybrid Approach for Recognizing Hindi Named Entity”, Proceedings of the International Conference on Managing Next Generation Software Applications - 2008 (MNGSA 2008), Coimbatore, India, 5th- 6th December 2008.

Srihari R., Niu C. and Li W. 2000. A Hybrid Approach for Named Entity and Sub-Type Tagging. In: Proceedings of the sixth conference on applied natural language processing.

Sudeshna Sarkar, Sujan Saha and Prthasarthi Ghosh, "Named Entity Recognition for Hindi",In Microsoft Research India Summer School talk, p. 21-30, May 2007.

Takeuchi K. and Collier N. 2002. “Use of Support Vector Machines in extended named entity recognition” in the proceedings of the sixth Conference on Natural Language Learning (CoNLL-2002), Taipei, Taiwan, China.

Wakao T., Gaizauskas R. and Wilks Y. 1996. “Evaluation of an algorithm for the Recognition and Classification of Proper Names”, in the proceedings of COLING-96.

MANUSCRIPT AUTHORS

Mr. Shilpi Srivastava

University of Mumbai - India

shilpii26@gmail.com

Professor Mukund Sanglikar

- India

Professor D.C Kothari

- India

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS