Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

(172.79KB)
This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 74 countries worldwide.
Classification of Oromo Dialects: A Computational Approach
Feda Negesse
Pages - 1 - 10     |    Revised - 31-03-2015     |    Published - 30-04-2015
Volume - 6   Issue - 1    |    Publication Date - March / April 2015  Table of Contents
MORE INFORMATION
KEYWORDS
Oromo Language, Oromo Dialect, Levenshtein Algorithm, Lexical Distance, Computational Methods.
ABSTRACT
Oromo is a lowland east Cushitic language which has tens of millions of native speakers in Ethiopia and in neighboring countries such as Kenya and Somalia. In the past, some attempts have been made to subjectively divide the language into different dialects or genetic units based on some phonological and lexical features. However, this study is intended to automatically compute lexical distances among varieties of the language spoken in Ethiopia and to objectively classify them into dialect areas. One hundred sixty basic words were used to calculate the normalized lexical distances with the Levenshtein Algorithm and an agglomerative clustering method was employed to classify the linguistic varieties into dialect areas. It is observed that the objective method has yielded a good result in dividing the linguistic varieties into six clusters and this classification is similar to some of the previous subjective classifications. It is also noted that the linguistic varieties have formed hierarchical clusters based on their geographical proximities, showing the dialectological fact that a geographical proximity predicts a linguistic similarity. A new classification of dialects of the language has been proposed but further research is needed to validate it with more lexical data and other clustering techniques.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
1 P. Matthews.The Concise Oxford Dictionary of Linguistics. Oxford: Oxford University Press, 1997.
2 J. Chambers and P. Trudgill. Dialectology. Cambridge: Cambridge University Press, 1980.
3 B. Kessler. "Computational dialectology in irish gaelic." in Proc. of the European Associaton for Computational Linguistics, 1995, pp. 60�67.
4 W. Heeringa. "Measuring Dialect Pronunciation Differences using Levenshtein Distance." Ph.D.thesis, University of Groningen, 2004.
5 R. Wagner and M. Fisher. "The string-to-string correction problem." Journal of the ACM, vol. 21, pp. 168�178, 1974.
6 A. Marzal and E. Vida. "Computation of Normalized Edit Distances and Applications." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, pp. 926�932,1993.
7 L. Yujian and L. Bo. "A normalized Levenshtein distance metric." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, pp 1091�1095, 2007.
8 V. I. Levenshtein. "Binary codes capable of correcting deletions, insertions, and reversals." Soviet Physics Doklady , vol.10, pp. 707�10,1966.
9 J. Nerbonne, W.Heeringa and P. Kleiweg. "Edit distance and dialect proximity." in Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, ed, D. Sankoff and Kruskal J. Stanford:CSLI Press, 1999, pp. v�xv.
10 C. Higuera and L. Mic� (2015,Jan.) " A Contextual Normalised Edit Distance." Researchgate.[On-line]. 23(2). Available: www.researchgate.net/Higuera/contextual.[Jan.2,2015].
11 C. Gooskens and W. Heeringa. "Perceptive evaluation of Levenshtein dialect distance measurements using Norwegian dialect data." Language Variation and Change, vo.16, pp. 189�207, 2004.
12 J. Nerbonne. "Computational Contributions to the Humanities." in Conference of the Association for Literary and Linguistic Computing and The Association for Computers and the Humanities, Gothenburg, Sweden, 2004.
13 J. Nerbonne. "Identifying linguistic structure in aggregate comparison." Literary and Linguistic Computing, vol. 21, pp.463�75, 2006.
14 L. Salifou and H. Naroua. "Design of A Spell Corrector For Hausa Language." International Journal of computational Linguistics, vol.5, pp.14-26, 2014.
15 W. Heeringa, P.Kleiweg, C. Gooskens and J. Nerbonne." Evaluation of string." in proc. the Workshop on Linguistic Distances, 2006.
16 J. Morberg, C.Goosken and J.Nerbonne. "Conditional entropy as a measure of linguistic remoteness between related languages" . in Proc. Computational Linguistics, 2007.
17 H. Kebede. "Raayaa Oromo Phonology: Aspects of Palatalization" in Ethiopia in Broader Perpespectives, 1997, vol. pp.469-91.
18 Central Statistical Agency. Population and Housing Census of Ethiopia. Addis Ababa: Central Statistical Agency, 2007.
19 M. L. Bender, E. Mulugeta and D. L. Stinson. Two Cushitic languages. in Language in Ethiopia, M. L. Bender, J. D. Bowen, R. L. Cooper and C. A.Ferguson, ED,. London: Oxford University Press, 1976, pp. 130-54.
20 G. Gragg. Oromo of Wellega. in Language in Ethiopia, M. L. Bender, J. D. Bowen, R. L. Cooper and C. A.Ferguson, ED, London: Oxford University Press, 1976, pp. 166-95.
21 B. Heine. "The Waata Dialect of Oromo: Grammatical Sketch and Vocabulary, Language and Dialect Atlas of Kenya". Journal of the International African Institute, vol. 55, pp. 228- 232, 1980.
22 T. Wako. "The phonology of Mecha Oromo". Unpublished MA Thesis. Institute of Language Studies: Addis Abeba University, Ethiopia, 1981.
23 M. Lloret (1994). A Comparative Study of Consonant Assimilation in Some Oromo Dialects. in the 3rd International Symposium on Cushitic and Omotic Languages, Berlin.
24 H. Kebede. "Causative Verb and Palatalization in Oromo".The Journal of Oromo Studies vol 14, pp.95-109, 2007.
25 H. Kebede. "Towards the Genetic Classification of the Afaan Oromoo Dialects." Published PhD Thesis, Department of Linguistics and Scandinavian Studies: The University of Oslo, Norway, 2009.
26 ] Discover Ethiopia, http://hayo.co/discover-ethiopia, Feb. 2015.
27 B. Schauerte, G. A. Fink, "Focusing Computational Visual Attention in Multi-Modal Human- Robot Interaction," in Proc. ICMI, 2010.
28 D. Shaw. "Statistical analysis of dialect boundaries." Computers and the Humanities, pp.173-177,1974
29 S. Hyvonen, A Leino, and M. Salmenkivi. "Multivariate Analysis of Finnish Dialect Data:An Overview of Lexical Variation." Literary and Linguistic Computing, vol. 22, 2007.
30 J.Verma, and V. Richhariya. "A Review: Salient Feature Extraction Using K-Mediods Clustering Technique." Journal of Computer Science and Information Technology, pp.23 � 25, 2012.
31 K. Beijering, C. Gooskens and W. Heeringa ." Predicting intelligibility and perceived linguistic distance by means of the Levenshtein algorithm." Linguistics in the Netherlands, pp.13-24, 2008.
Dr. Feda Negesse
Addis Ababa University - Ethiopia
feda.negesse@aau.edu.et