Language Identifier for Languages of Pakistan Including Arabic and Persian
Qaiser Abbas, M. S. Ahmad, Sadia Niazi
Pages - 27 - 35     |    Revised - 30-11-2010     |    Published - 20-12-2010
Volume - 1   Issue - 3    |    Publication Date - December 2010  Table of Contents
Language recognizer/identifier/guesser is the basic application used by humans to identify the language of a text document. It takes simply a file as input and after processing its text, decides the language of text document with precision using LIJ-I, LIJ-II and LIJ-III. LIJ-I results in poor accuracy and strengthen with the use of LIJ-II which is further boosted towards a higher level of accuracy with the use of LIJ-III. It also helps in calculating the probability of digrams and the average percentages of accuracy. LIJ-I considers the complete character sets of each language while the LIJ-II considers only the difference. A JAVA based language recognizer is developed and presented in this paper in detail.
1 Abbas, Q. (2014, August). Semi-semantic part of speech annotation and evaluation. In Proceedings of ACL 8th Linguistic Annotation Workshop held in conjunction with COLING, Association of Computational Linguistics, P (pp. 75-81).
2 Abbas, Q. (2014). Building Computational Resources: The URDU. KON-TB Treebank and the Urdu Parser (Doctoral dissertation).
3 Abbas, Q. (2014). A Stochastic Prediction Interface for Urdu. International Journal of Intelligent Systems and Applications (IJISA), 7(1), 94.
4 Khanam, M. H. experiments in probabilistic context free grammar for urdu language.
5 Abbas, Q. (2012). Building a hierarchical annotated corpus of urdu: the URDU. KON-TB treebank. In Computational Linguistics and Intelligent Text Processing (pp. 66-79). Springer Berlin Heidelberg.
Mr. Qaiser Abbas
University of Sargodha - Pakistan
Mr. M. S. Ahmad
University of Sargodha - Pakistan
Mr. Sadia Niazi
University of Sargodha - Pakistan