Classification of Oromo Dialects: A Computational Approach
Feda Negesse
Pages - 1 - 10     |    Revised - 31-03-2015     |    Published - 30-04-2015
Volume - 6   Issue - 1    |    Publication Date - March / April 2015  Table of Contents
Oromo Language, Oromo Dialect, Levenshtein Algorithm, Lexical Distance, Computational Methods.
Oromo is a lowland east Cushitic language which has tens of millions of native speakers in Ethiopia and in neighboring countries such as Kenya and Somalia. In the past, some attempts have been made to subjectively divide the language into different dialects or genetic units based on some phonological and lexical features. However, this study is intended to automatically compute lexical distances among varieties of the language spoken in Ethiopia and to objectively classify them into dialect areas. One hundred sixty basic words were used to calculate the normalized lexical distances with the Levenshtein Algorithm and an agglomerative clustering method was employed to classify the linguistic varieties into dialect areas. It is observed that the objective method has yielded a good result in dividing the linguistic varieties into six clusters and this classification is similar to some of the previous subjective classifications. It is also noted that the linguistic varieties have formed hierarchical clusters based on their geographical proximities, showing the dialectological fact that a geographical proximity predicts a linguistic similarity. A new classification of dialects of the language has been proposed but further research is needed to validate it with more lexical data and other clustering techniques.
Dr. Feda Negesse
Addis Ababa University - Ethiopia