Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

This is an Open Access publication published under CSC-OpenAccess Policy.
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Shilpi Srivastava, Mukund Sanglikar, D.C Kothari
Pages - 10 - 23     |    Revised - 01-07-2011     |    Published - 05-08-2011
Volume - 2   Issue - 1    |    Publication Date - July / August 2011  Table of Contents
Machine Learning, MaxEnt, CRF, Rule-base, Voting, Named Entity Recognition
Named Entity Recognition (NER) is a major early step in Natural Language Processing (NLP) tasks like machine translation, text to speech synthesis, natural language understanding etc. It seeks to classify words which represent names in text into predefined categories like location, person-name, organization, date, time etc. In this paper we have used a combination of machine learning and Rule based approaches to classify named entities. The paper introduces a hybrid approach for NER. We have experimented with Statistical approaches like Conditional Random Fields (CRF) & Maximum Entropy (MaxEnt) and Rule based approach based on the set of linguistic rules. Linguistic approach plays a vital role in overcoming the limitations of statistical models for morphologically rich language like Hindi. Also the system uses voting method to improve the performance of the NER system. Keywords: NER, MaxEnt, CRF, Rule base, Voting, Hybrid approach ________________________________________
CITED BY (13)  
1 Hinkova, A., Bubnik, Z., & Kadlec, P. (2014). Chemical Composition of Sugar and Confectionery Products. In Handbook of Food Chemistry (pp. 1-34). Springer Berlin Heidelberg.
2 Seedah, D. P. K. (2014). Retrieving information from heterogeneous freight data sources to answer natural language queries (Doctoral dissertation).
3 Morwal, S., Chopra, D., & Purohit, G. N. named entity recognition in natural languages using transliteration.
4 Morwal, S., & Chopra, D. (2013). nerhmm: A Tool For Named Entity Recognition based on Hidden Markov Model. International Journal on Natural Language Computing (IJNLC), 2, 43-49.
5 Chopra, D., & Morwal, S. (2013). Named Entity Recognition in English Using Hidden Markov Model. International Journal.
6 Jimmy, L., & Kaur, D. (2013). Named entity recognition in Manipuri: a hybrid approach. In Language Processing and Knowledge in the Web (pp. 104-110). Springer Berlin Heidelberg.
7 Chopra, D., Morwal, S., & Purohit, G. N. hidden markov model based named entity recognition tool.
8 Morwal, S., Jahan, N., & Chopra, D. (2012). Named entity recognition using hidden Markov model (HMM). International Journal on Natural Language Computing (IJNLC), 1(4).
9 Chopra, D., Jahan, N., & Morwal, S. (2012). Hindi named entity recognition by aggregating rule based heuristics and hidden markov model. International Journal of Information Sciences and Techniques (IJIST) Vol, 2.
10 Jahan, N., Morwal, S., & Chopra, D. (2012). Named entity recognition in indian languages using gazetteer method and hidden markov model: A hybrid approach. IJCSET, March.
11 Jahangir, F., Anwar, W., Bajwa, U. I., & Wang, X. (2012, December). N-gram and gazetteer list based named entity recognition for urdu: A scarce resourced language. In Proceedings of the 10th Workshop on Asian Language Resources (pp. 95-104).
12 Chopra, D., & Morwal, S. (2012). Named Entity Recognition in Punjabi Using Hidden Markov Model. International Journal of Computer Science & Engineering Technology (IJCSET), 3(12).
13 Sathyanarayana, S. A. S. A Hybrid approach for Named Entity Recognition, Classification and Extraction (NERCE) in Kannada Documents.
1 Google Scholar
2 CiteSeerX
3 refSeek
4 Scribd
5 SlideShare
6 PdfSR
1 Sudeshna Sarkar, Sujan Saha and Prthasarthi Ghosh, "Named Entity Recognition for Hindi",In Microsoft Research India Summer School talk, p. 21-30, May 2007.
2 Anil Kumar Singh, "Named Entity Recognition for South and South East Asian Languages:Taking Stock", p. 5-7, In IJCNLP 2008.
3 Hideki Isozaki. 2001. “Japanese named entity recognition based on a simple rule generator and decision tree learning” in the proceedings of the Association for Computational Linguistics, pages 306-313. India.
4 Takeuchi K. and Collier N. 2002. “Use of Support Vector Machines in extended named entity recognition” in the proceedings of the sixth Conference on Natural Language Learning (CoNLL-2002), Taipei, Taiwan, China.
5 Charles L. Wayne. 1991., “A snapshot of two DARPA speech and Natural Language Programs” in the proceedings of workshop on Speech and Natural Languages, pages 103-404, Pacific Grove, California. Association for Computational Linguistics.
6 A. Borthwick, "A Maximum Entropy Approach to Named Entity Recognition", In NY University, p. 1-4, 18-24, PHD Thesis, September 1999
7 Daniel M. Bikel, Scott Miller, Richard Schwartz and Ralph Weischedel. 1997 “Nymble: a high performance learning name-finder” in the proceedings of the fifth conference on Applied natural language processing, pages 194-201, San Francisco, CA, USA Morgan Kaufmann Publishers Inc.
8 IJCNLP-08 Workshop data set, Source: http://ltrc.iiit.net/ner-ssea-08/index.cgi?topic=5
9 Akshar Bharti, Rajeev Sangal and Dipti M Sharma, "Shakti Analyzer: SSF Representation",IIIT Hyderabad, p. 3-5, 2006
10 Lafferty, J., McCallum, A., Pereira, F., "Conditional random fields: Probabilistic models for segmenting and labeling sequence data", In: Proc. 18th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, p. 1-5, 2001
11 Hindi Wordnet, Source: http://www.cfilt.iitb.ac.in/wordnet/webhwn/
12 McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit."http://mallet.cs.umass.edu. 2002.
13 Hanna M. Wallach, "Conditional Random Fields: An Introduction”, Technical Report,University of Pennsylvania. 4-5, 2004.
14 Lawrence R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", In Proceedings of the IEEE, 77 (2), p. 257-286,February 1989
15 R. Grishman. 1995. “The NYU system for MUC-6 or Where’s the Syntax” in the proceedings of Sixth Message Understanding Conference (MUC-6) , pages 167-195, Fairfax, Virginia.
16 Wakao T., Gaizauskas R. and Wilks Y. 1996. “Evaluation of an algorithm for the Recognition and Classification of Proper Names”, in the proceedings of COLING-96.
17 Mikheev A, Grover C. and Moens M. 1998. Description of the LTG system used for MUC-7.In Proceedings of the Seventh Message Understanding Conference.
18 R. Grishman, Beth Sundheim. 1996. “Message Understanding Conference-6: A Brief History” in the proceedings of the 16th International Conference on Computational Linguistics (COLING), pages 466-471, Center for Sprogteknologi, Copenhagen, Denmark.
19 Srihari R., Niu C. and Li W. 2000. A Hybrid Approach for Named Entity and Sub-Type Tagging. In: Proceedings of the sixth conference on applied natural language processing.
20 Cucerzan S. and Yarowsky D. 1999. Language independent named entity recognition combining morphological and contextual evidence. In: Proceedings of the Joint SIGDAT Conference on EMNLP and VLC 1999, pp. 90-99.
21 Li W. and McCallum A. 2003. Rapid Development of Hindi Named Entity Recognition using Conditional Random Fields and Feature Induction. In: ACM Transactions on Asian Language Information Processing (TALIP), 2(3): 290–294.
22 Gali, K., Sharma, H., Vaidya, A., Shisthla, P., Sharma, D.M.: Aggregrating Machine Learning and Rule-based Heuristics for Named Entity Recognition. In: Proceedings of the IJCNLP-08Workshop on NER for South and South East Asian Languages. (2008) 25–32
23 Asif Ekbal et. al. “Language Independent Named Entity Recognition in Indian Languages”.IJCNLP, 2008.
24 Prasad Pingli et al. “A Hybrid Approach for Named Entity Recognition in Indian Languages”.IJCNLP, 2008.
25 Shilpi Srivastava, Siby Abraham, Mukund Sanglikar: “Hybrid Approach for Recognizing Hindi Named Entity”, Proceedings of the International Conference on Managing Next Generation Software Applications - 2008 (MNGSA 2008), Coimbatore, India, 5th- 6th December 2008.
26 Shilpi Srivastava, Siby Abraham, Mukund Sanglikar, D C Kothari: “Role of Ensemble Learning in Identifying Hindi Names”, International Journal of Computer Science and Applications, ISSN No. 0974-0767.
Mr. Shilpi Srivastava
University of Mumbai - India
Professor Mukund Sanglikar
- India
Professor D.C Kothari
- India