Home   >   CSC-OpenAccess Library   >    Manuscript Information
Classification of Oromo Dialects: A Computational Approach
Feda Negesse
Pages - 1 - 10     |    Revised - 31-03-2015     |    Published - 30-04-2015
Volume - 6   Issue - 1    |    Publication Date - March / April 2015  Table of Contents
Oromo Language, Oromo Dialect, Levenshtein Algorithm, Lexical Distance, Computational Methods.
Oromo is a lowland east Cushitic language which has tens of millions of native speakers in Ethiopia and in neighboring countries such as Kenya and Somalia. In the past, some attempts have been made to subjectively divide the language into different dialects or genetic units based on some phonological and lexical features. However, this study is intended to automatically compute lexical distances among varieties of the language spoken in Ethiopia and to objectively classify them into dialect areas. One hundred sixty basic words were used to calculate the normalized lexical distances with the Levenshtein Algorithm and an agglomerative clustering method was employed to classify the linguistic varieties into dialect areas. It is observed that the objective method has yielded a good result in dividing the linguistic varieties into six clusters and this classification is similar to some of the previous subjective classifications. It is also noted that the linguistic varieties have formed hierarchical clusters based on their geographical proximities, showing the dialectological fact that a geographical proximity predicts a linguistic similarity. A new classification of dialects of the language has been proposed but further research is needed to validate it with more lexical data and other clustering techniques.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
A. Marzal and E. Vida. "Computation of Normalized Edit Distances and Applications." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, pp. 926�932,1993.
B. Heine. "The Waata Dialect of Oromo: Grammatical Sketch and Vocabulary, Language and Dialect Atlas of Kenya". Journal of the International African Institute, vol. 55, pp. 228- 232, 1980.
B. Kessler. "Computational dialectology in irish gaelic." in Proc. of the European Associaton for Computational Linguistics, 1995, pp. 60�67.
B. Schauerte, G. A. Fink, "Focusing Computational Visual Attention in Multi-Modal Human- Robot Interaction," in Proc. ICMI, 2010.
C. Gooskens and W. Heeringa. "Perceptive evaluation of Levenshtein dialect distance measurements using Norwegian dialect data." Language Variation and Change, vo.16, pp. 189�207, 2004.
C. Higuera and L. Mic� (2015,Jan.) " A Contextual Normalised Edit Distance." Researchgate.[On-line]. 23(2). Available: www.researchgate.net/Higuera/contextual.[Jan.2,2015].
Central Statistical Agency. Population and Housing Census of Ethiopia. Addis Ababa: Central Statistical Agency, 2007.
D. Shaw. "Statistical analysis of dialect boundaries." Computers and the Humanities, pp.173-177,1974
G. Gragg. Oromo of Wellega. in Language in Ethiopia, M. L. Bender, J. D. Bowen, R. L. Cooper and C. A.Ferguson, ED, London: Oxford University Press, 1976, pp. 166-95.
H. Kebede. "Raayaa Oromo Phonology: Aspects of Palatalization" in Ethiopia in Broader Perpespectives, 1997, vol. pp.469-91.
H. Kebede. "Towards the Genetic Classification of the Afaan Oromoo Dialects." Published PhD Thesis, Department of Linguistics and Scandinavian Studies: The University of Oslo, Norway, 2009.
H. Kebede. "Causative Verb and Palatalization in Oromo".The Journal of Oromo Studies vol 14, pp.95-109, 2007.
J. Chambers and P. Trudgill. Dialectology. Cambridge: Cambridge University Press, 1980.
J. Morberg, C.Goosken and J.Nerbonne. "Conditional entropy as a measure of linguistic remoteness between related languages" . in Proc. Computational Linguistics, 2007.
J. Nerbonne, W.Heeringa and P. Kleiweg. "Edit distance and dialect proximity." in Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, ed, D. Sankoff and Kruskal J. Stanford:CSLI Press, 1999, pp. v�xv.
J. Nerbonne. "Computational Contributions to the Humanities." in Conference of the Association for Literary and Linguistic Computing and The Association for Computers and the Humanities, Gothenburg, Sweden, 2004.
J. Nerbonne. "Identifying linguistic structure in aggregate comparison." Literary and Linguistic Computing, vol. 21, pp.463�75, 2006.
J.Verma, and V. Richhariya. "A Review: Salient Feature Extraction Using K-Mediods Clustering Technique." Journal of Computer Science and Information Technology, pp.23 � 25, 2012.
K. Beijering, C. Gooskens and W. Heeringa ." Predicting intelligibility and perceived linguistic distance by means of the Levenshtein algorithm." Linguistics in the Netherlands, pp.13-24, 2008.
L. Salifou and H. Naroua. "Design of A Spell Corrector For Hausa Language." International Journal of computational Linguistics, vol.5, pp.14-26, 2014.
L. Yujian and L. Bo. "A normalized Levenshtein distance metric." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, pp 1091�1095, 2007.
M. L. Bender, E. Mulugeta and D. L. Stinson. Two Cushitic languages. in Language in Ethiopia, M. L. Bender, J. D. Bowen, R. L. Cooper and C. A.Ferguson, ED,. London: Oxford University Press, 1976, pp. 130-54.
M. Lloret (1994). A Comparative Study of Consonant Assimilation in Some Oromo Dialects. in the 3rd International Symposium on Cushitic and Omotic Languages, Berlin.
P. Matthews.The Concise Oxford Dictionary of Linguistics. Oxford: Oxford University Press, 1997.
R. Wagner and M. Fisher. "The string-to-string correction problem." Journal of the ACM, vol. 21, pp. 168�178, 1974.
S. Hyvonen, A Leino, and M. Salmenkivi. "Multivariate Analysis of Finnish Dialect Data:An Overview of Lexical Variation." Literary and Linguistic Computing, vol. 22, 2007.
T. Wako. "The phonology of Mecha Oromo". Unpublished MA Thesis. Institute of Language Studies: Addis Abeba University, Ethiopia, 1981.
V. I. Levenshtein. "Binary codes capable of correcting deletions, insertions, and reversals." Soviet Physics Doklady , vol.10, pp. 707�10,1966.
W. Heeringa, P.Kleiweg, C. Gooskens and J. Nerbonne." Evaluation of string." in proc. the Workshop on Linguistic Distances, 2006.
W. Heeringa. "Measuring Dialect Pronunciation Differences using Levenshtein Distance." Ph.D.thesis, University of Groningen, 2004.
] Discover Ethiopia, http://hayo.co/discover-ethiopia, Feb. 2015.
Dr. Feda Negesse
Addis Ababa University - Ethiopia