Home   >   CSC-OpenAccess Library   >    Manuscript Information
Suffix-stripping Algorithms and Transducers for the Fulani Language
Zouleiha Alhadji Ibrahima, Dayang Paul, Kolyang, Guidana Gazawa Frederic
Pages - 1 - 17     |    Revised - 31-05-2022     |    Published - 30-06-2022
Volume - 13   Issue - 1    |    Publication Date - June 2022  Table of Contents
MORE INFORMATION
KEYWORDS
Peul, Fulani, Suffix-stripping, Stemming, Linguistic, Transducers.
ABSTRACT
Because of the large and constantly increasing amount of information available on the Internet, users are facing diverse challenges and difficulties while trying to satisfy their needs. In fact, the objective of today's information retrieval systems is no longer accessing information but the search and filtering of relevant information. The language used for searching information plays a major role. If we consider resource scarce local or national languages, the situation becomes even more challenging. Many African languages fall into the group of resource scarce languages. Therefore, there is a need to explore and build more specialised information systems that enable speakers of African languages to discover valuable information across linguistic and cultural barriers. As one of the most dispersed languages in Africa, the Peul also called Fulani language suffers from a significant handicap in its computerisation and automatic processing due to the inexistence of digital and linguistic resources. Considering the fact that a devoted care and attention to conserve, guarantee the sustainability of languages is important, few studies and computerisation works have been carried out on African Languages such as Fulani. The aim of this work is to lay some bricks towards tools for the automatic processing of the Fulani language. This language belongs to several dialectal areas and there are almost no digital documents of the Fulani language of the Adamaoua dialectal area. The originality of this work is among others the digital processing of Noye Dominique Fulani dictionary from North Cameroon; we then studied stemming approaches for Fulani words using transducers that clearly show how to remove classifiers from words in order to obtain the stem. To do so, we have grouped all the classifiers that are suffixes in number: singular and plural and by degree of classifiers. An example of the process of removing a suffix has been described in this article. Up to date, no research work has been done aiming at processing the Fulani language or native African languages similar to Fulani. In fact, the stemming approach is crucial in all information retrieval systems because it allows the translation and the classification of documents as well as indexing of words. To specify the stemming approaches, we have adapted the stemming algorithms of Lovins and Porter to the Peul language, knowing that they are the best known in literature and they have the advantage of being applied to other languages. Finally, the evaluation of these stemming methods was done using the method of Christ Paice. Based on the principle that words sharing the same stem are likely to share a unity of meaning, we undertook a morphological analysis of 5186 Fulani words from the Fulani dictionary of Dominique Noye. The results obtained from this method by calculating the error rates of over-stemming, under-stemming and truncation errors have shown that both algorithms are efficient for the stemming of Fulani language.
Al-Kharashi, I. A., & Evens, M. W. (1994). Comparing words, stems, and roots as index terms in an Arabic information retrieval system. Journal of the American Society for Information Science, 45(8), 548-560.
Amidou, M. (2009). Bi-grammaire fulfulde/pulaar-francais. Direction de l'Education et de la Formation: Programme d’apprentissage du francais en contexte multilingue.
Arnott, D. W. (1960). The tense system in Gombe Fula. University of London, School of Oriental and African Studies (United Kingdom).
Ataa-Allah, F., & Boulaknadel, S. (2010, July). Pseudo-racinisation de la langue amazighe. In Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts (pp. 44-49).
Boukhari, K. (2013). Un Nouvel Algorithme de Stemmatisation pour l’Indexation Automatique de Documents non-structures: Stemmer SAID.
Cefan. Répartition du peul d’Afrique, URL : http://www.axl.cefan.ulaval.ca/afrique/peuls-map.htm, visited on 13-04-2021.
Conjugaison. Les formes verbales URL:http://www.conjugaison.com/grammaire/formes verbales.html, visited on 21-01-2021.
Darwish, K. (2002). Building a shallow Arabic morphological analyzer in one day.
Diallo, A. (2015). Précis de grammaire et de lexique du peul du FoutaDjallon. Research Institute for Languages and Cultures of Asia and Africa (ILCAA). Tokyo University of ForeignStudies.
Francois, Y. (2007). Transducteurs finis en Traitement des Langues. École Nationale Supérieure des télécommunications, Département Informatique et Réseaux, Paris.
Harrathi, F., Roussey, C., Calabretto, S., Maisonnasse, L., &Gammoudi, M. M. (2009). Indexation sémantique des documents multilingues. INFORSID, editor, Atelier RISE associé au 27ème Congrès INFORSID, 31-50.
Heine, B., & Nurse, D. (Eds.). (2000). African languages: An introduction. Cambridge University Press.
Jivani, A. G. (2011). A comparative study of stemming algorithms. Int. J. Comp. Tech. Appl, 2(6), 1930-1938.
Kevers, L., Gueniot, F., Tognotti, A. G., &Medori, S. R. (2019). Outiller une langue peu dotée grâce au TALN: l’exemple du corse et BDLC. In 26e Conférence sur le Traitement Automatique des Langues Naturelles (pp. 371-380). ATALA.
Le, N. T. (2019). Traduction automatique pour une paire de langues peu dotée. These du Doctorat en Informatique Cognitive.
Lovins, J. B. (1968). Development of a stemming algorithm. Mech. Transl. Comput. Linguistics, 11(1-2), 22-31.
Mahyoob, M. (2018). Deterministic Finite State Automaton of Arabic Verb System: A Morphological Study. International Journal of Computational Linguistics (IJCL), 9(1).
Majumder, P., Mitra, M., &Datta, K. (2006, September). Statistical vs. rule-based stemming for monolingual french retrieval. In Workshop of the Cross-Language Evaluation Forum for European Languages (pp. 107-110). Springer, Berlin, Heidelberg.
Mohamadou, A. (2014). Le verbe en peul: Formes et valeurs en pulaar du Fuuta-Tooro. KARTHALA Editions.
Noye, D. (1974). Cours de foulfouldé: dialecte peul du Diamare, Nord-Cameroun.
Noye, D. (1989). Dictionnaire foulfouldé-français: dialecte peul du Diamaré, Nord-Cameroun. Librairie Orientaliste Paul Geuthner.
Omri, M. N. (2004). Possibilistic pertinence feedback and semantic networks for goal extraction. Asian Journal of Information Technology, 3(4), 258-265.
Omri, M. N., &Chouigui, N. (2001). Linguistic variables definition by membership function and measure of similarity. In Proceedings of the14th International Conference on Systems Science (Vol. 2, pp. 264-273).
Paice, C. D. (1994). An evaluation method for stemming algorithms. In SIGIR’94 (pp. 42-50). Springer, London.
Paternostre, M., Francq, P., Lamoral, J., Wartel, D., & Saerens, M. (2002). Carry, un algorithme de désuffixation pour le français. Rapport technique du projet Galilei.
Porter, M. F. (1997). An algorithm for suffix stripping. Readings in information retrieval. Morgan Kaufmann, 313-316.
Samuel, J., Teferra, S., Samuel, J., Teferra, S., Samuel, J., &Teferra, S. (2018). Designing A Rule Based Stemming Algorithm for Kambaata Language Text. no, 9, 41-54.
Taylor, F. W. (1953). A grammar of the Adamawa dialect of the Fulani language (Fulfulde).
Tesfaye, D., & Abebe, E. (2010). Designing a Rule Based Stemmer for Afaan Oromo Text. International journal of computational linguistics (IJCL), 1(2), 1-11.
Tradlibre. Histoire de la langue Peuls. URl: https://www.tradlibre.fr/histoire/histoire-de-la-langue-peuls, visited on 7-04-2021
Younoussi, Y. E., Sdigui, A.D.,Belahmer, H. (2007). La racinisation de la langue arabe par les automates à états finis (AEF). Laboratoire Systèmes d'Information Multimédia et Mobiles (SI3M), Ecole Nationale Supérieure de l’Informatique et Analyse des Systèmes Maroc, Laboratoire Alkhawarizmi de Génie Informatique (LAGI).
Mrs. Zouleiha Alhadji Ibrahima
Department of Mathematics and Computer Science, Faculty of Science, the University of Ngaoundéré - Cameroon
zouleihaalhadji@gmail.com
Mr. Dayang Paul
Department of Mathematics and Computer Science, Faculty of Science, the University of Ngaoundéré - Cameroon
Mr. Kolyang
Department of Computer Science, Higher Teachers, Training College, the University of Maroua - Cameroon
Mr. Guidana Gazawa Frederic
Department of Mathematics and Computer Science, Faculty of Science, the University of Ngaoundéré - Cameroon


CREATE AUTHOR ACCOUNT
 
LAUNCH YOUR SPECIAL ISSUE
View all special issues >>
 
PUBLICATION VIDEOS