EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Designing A Rule Based Stemming Algorithm for Kambaata Language Text

Jonathan Samuel Sumamo, Solomon Teferra

Pages - 41 - 54 | Revised - 31-07-2018 | Published - 01-10-2018

Published in International Journal of Computational Linguistics (IJCL)

Volume - 9 Issue - 2 | Publication Date - June 2018 Table of Contents

MORE INFORMATION

References | Abstracting & Indexing

KEYWORDS

Kambaata Stemmer, Rule-Based Stemmer, Stemming Algorithm, Kambaata Language.

ABSTRACT

Stemming is the process of reducing inflectional and derivational variants of a word to its stem. It has substantial importance in several natural language processing applications. In this research, a rule based stemming algorithm that conflates Kambaata word variants has been designed for the first time. The algorithm is a single pass, context-sensitive, and longest-matching designed by adapting rule-based stemming approach. Several studies agree that Kambaata is a strictly suffixing language with a rich morphology and word formations mostly relying on suffixation; even though its word formation involves infixation, compounding and reduplication as well.

The output of this study is a context-sensitive, longest-match stemming algorithm for Kambaata words. To evaluate the stemmer's effectiveness, error counting method was applied. A test set of 2425 distinct words was used to evaluate the stemmer. The output from the stemmer indicates that out of 2425 words, 2349 words (96.87%) were stemmed correctly, 63 words (2.60%) were over stemmed and 13 words (0.54%) were under stemmed. What is more, a dictionary reduction of 65.86% has also been achieved during evaluation.

The main factor for errors in stemming Kambaata words is the language's rich and complex morphology. Hence a number of errors can be corrected by exploring more rules. However, it is difficult to avoid the errors completely due to complex morphology that makes use of concatenated suffixes, irregularities through infixation, compounding, blending, and reduplication of affixes.

ABSTRACTING & INDEXING

1	Google Scholar

2	BibSonomy

3	Doc Player

4	Scribd

5	SlideShare

REFERENCES

"Ethnologue: Languages of the World," Ethnologue, 2017. [Online]. Available: https://www.ethnologue.com.country/ET [Accessed: 12- Dec- 2017.

A. Alemu and L. Asker, "An Amharic Stemmer: Reducing Words to their Citation Forms," The Association for Computational Linguistics, Prague, Czech Republic, June 2007.

A. Ismailov, M.M. Abdul Jalil, Z. Abdullah and N.H. Rahim, "A Comparative Study of Stemming Algorithms for Use with the Uzbek Language," In proceedings of the 3rd International Conference on Computer and Information Sciences (ICCOINS), 2016.

Anjali Ganesh Jivani et al, "A Comparative Study of Stemming Algorithms," Int. J. Comp. Tech. Appl., vol. 2, no. 6, pp. 1930-1938.

C. D. Paice, "Another stemmer," ACM SIGIR Forum, vol. 24, no. 3, pp. 56-61, 1990.

C. Paice, "Method for evaluation of stemming algorithms based on error counting," Journal of the American Society for Information Science, vol. 47, no. 8, pp. 632-649, 1996.

D. Harman, "How effective is suffixing?" Journal of the American Society for Information Science, vol. 42, no. 1, pp. 7-15, 1991.

D. Sharma, "Stemming Algorithms: A Comparative Study and their Analysis," International Journal of Applied Information Systems, vol. 4, no. 3, pp. 1-6, 2012.

D. Tesfaye, and E. Abebe, "Designing a Rule Based Stemmer for Afaan Oromo Text," International journal of computational linguistics (IJCL), vol. 1, no. 2, October 2010.

G. Salton, Automatic text processing: The Transformation, Analysis, and Retrieval of Information by Computer, 1st ed. Reading, Mass. [etc.]: Addison-Wesley, 1989.

J. B. Lovins, "Development of a stemming algorithm," Mechanical Translation and Computational Linguistics, vol. 11, no. 1 and 2, 1968.

J. Dawson, "Suffix removal for word conflation," In Bulletin of the Association for Literary and Linguistics computing, vol. 2, No. 3, pp. 33-46, 1974.

J. Savoy, "Stemming of French Words Based on Grammatical Categories," Journal of American Society for Information Science, vol. 44, no. 1, pp. 1-9, 1993.

L. Lessa, "Development of stemming algorithm for Wolaytta text," Master's Thesis, Addis Ababa University, Addis Ababa, July 2003, unpublished.

M. P. Lennon, D. Tarry, and P. Willett, "An evaluation of conflation algorithms for information retrieval," Journal of Information Science, vol. 3, pp. 177-183, 1981.

M. Porter, "An algorithm for suffix stripping," Program, vol. 14, no. 3, pp. 130-137, 1980.

M. Wakshum, "Development of Stemming Algorithm for Afaan Oromo Text," M. Sc. Theses, Addis Ababa University, 2000, unpublished.

Md. Islam, Md. Uddin and M. Khan, "A Light Weight Stemmer for Bengali and Its Use in Spelling Checker," Center for Research on Bangla Language Processing, BRAC University, Dhaka, Bangladesh.

N. Alemayehu and P. Willet, "Stemming of Amharic Words for Information Retrieval," Literary and Linguistic Computing, vol. 17, no. 1, pp. 1-17, 2002.

P. Willett, "The Porter stemming algorithm: then and now," Program, vol. 40, no. 3, pp. 219-223, 2006.

R. Krovetz, "Viewing Morphology as an inference process," In proceedings of the 16thAnnual International ACM SIGIR conference on research and development in information retrieval, pp. 191-202, ACM New York, 1993.

Rani, SP Ruba, B. Ramesh, M. Anusha, and J. G. R. Sathiaseelan, "Evaluation of Stemming Techniques for Text Classification," International Journal of Computer Science and Mobile Computing, vol. 4, no. 3, pp. 165-171, 2015.

W. B. Frakes, "Stemming algorithms. In Frakes," in Information retrieval: data structures and algorithms: Prentice-Hall, 1992, pp. 131-160.

Y. Fisseha, "Development of Stemming Algorism for Tigrigna Text," Master's Thesis, Addis Ababa University, Addis Ababa, June 2011, unpublished.

Y. Treis, "Categorial hybrids in Kambaata," Journal of African Languages and Linguistics, De Gruyter, pp. 215-254, 2012.

Y. Treis, "Expressing future time reference in Kambaata," Nordic Journal of African Studies, vol. 20, no. 2, pp.132-149, 2012.

Y. Treis, "Kambaata Numerals and Denumerals Revisited," LLACAN.

Y. Treis, "Relativization in Kambaata from a typological point of view," In: Zygmunt Frajzyngier and Erin Shay (eds.), Interaction of morphology and syntax: Case studies in Afroasiatic, pp. 161-206, Amsterdam/Philadelphia: Benjamins. 2008b.

Y. Treis, A grammar of Kambaata (Ethiopia), Part I: Phonology, Nominal Morphology and Non-verbal Predication, 1st ed. KoÌˆln: RÃ¼diger KÃ¶ppe, 2008.

MANUSCRIPT AUTHORS

Mr. Jonathan Samuel Sumamo

Telecom Excellence Academy Ethio Telecom's Corporate University - Ethiopia

jimmyelove@gmail.com

Dr. Solomon Teferra

Faculty of Informatics/Department of Information Science, Addis Ababa University Addis Ababa - Ethiopia

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS