Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 74 countries worldwide.
Rule-Based Standard Arabic Phonetization at Phoneme, Allophone, and Syllable Level
Fadi Sindran, Firas Mualla, Tino Haderlein, Khaled Daqrouq, Elmar Nöth
Pages - 23 - 37     |    Revised - 31-10-2016     |    Published - 01-12-2016
Volume - 7   Issue - 2    |    Publication Date - December 2016  Table of Contents
Phonetization, Standard Arabic, Phonetic Transcription, Pronunciation Dictionaries, Transcription Rules.
Phonetization is the transcription from written text into sounds. It is used in many natural language processing tasks, such as speech processing, speech synthesis, and computer-aided pronunciation assessment. A common phonetization approach is the use of letter-to-sound rules developed by linguists for the transcription from grapheme to sound. In this paper, we address the problem of rule-based phonetization of standard Arabic. 1The paper contributions can be summarized as follows: 1) Discussion of the transcription rules of standard Arabic which were used in literature on the phonemic and phonetic level. 2) Improvements of existing rules are suggested and new rules are introduced. Moreover, a comprehensive algorithm covering the phenomenon of pharyngealization in standard Arabic is proposed. Finally, the resulting rules set has been tested on large datasets. 3) We present a reliable automatic phonetic transcription of standard Arabic at five levels: phoneme, allophone, syllable, word, and sentence. An encoding which covers all sounds of standard Arabic is proposed, and several pronunciation dictionaries have been automatically generated. These dictionaries have been manually verified yielding an accuracy higher than 99 % for standard Arabic texts that do not contain dates, numbers, acronyms, abbreviations, and special symbols. The dictionaries are available for research purposes.
1 Google Scholar 
2 CiteSeerX 
3 Scribd 
4 SlideShare 
5 PdfSR 
1 F. Sindran, F. Mualla, K. Bobzin, E. Nöth, "Automatic robust rule-based phonetization of standard Arabic," in: Text, Speech, and Dialogue, Vol. 9302 of LNAI, Springer, 2015, pp. 442-451.
2 M. Ali, M. Elshafei, M. Al-Ghamdi, H. Al-Muhtaseb, A. Al-Najjar, "Arabic phonetic dictionaries for speech recognition," Journal of Information Technology Research 2, 2009, pp. 67-80.
3 Y. El-Imam, "Phonetization of arabic: rules and algorithms," Computer Speech & Language 18, 2004, pp. 339-373.
4 K. Hadjar, R. Ingold, "Arabic newspaper page segmentation," in: 7th International Conference on Document Analysis and Recognition, Vol. 2, 2003, pp. 895-899.
5 A. Masmoudi, M. Ellouze Khemakhem, Y. Estève, L. Hadrich Belguith, N. Habash, "A corpus and phonetic dictionary for tunisian arabic speech recognition," in: LREC, 2014, pp. 306- 310.
6 S. Harrat, K. Meftouh, M. Abbas, K. Smaili, " Grapheme to phoneme conversion: an arabic dialect case," in: 4th International Workshop on Spoken Language Technologies for Under- resourced Languages (SLTU'14), 2014.
7 M. Al-ghamdi, H. Al-Muhtasib, M. Elshafei, "Phonetic rules in arabic script," Journal of King Saud University - Computer and Information Sciences 16, 2004, pp. 85-115.
8 M. Alghamdi, Y. O. M. El Hadj, M. Alkanhal, "A manual system to segment and transcribe arabic speech," in: IEEE International Conference on Signal Processing and Communications (ICSPC), 2007, pp. 233-236.
9 F. Biadsy, N. Habash, J. Hirschberg, "Improving the arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules," in: Proceedings of Human Language Technologies, The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL/HLT), 2009, pp. 397-405.
10 Arpabet, Internet: https://en.wikipedia.org/wiki/Arpabet [October 23, 2016].
11 I. Manzur, [The tongue of the Arabs] (in Arabic), DAR SADER, P. O. B. 10, Beirut, Lebanon, 1994.
12 M. Zeki, O.O. Khalifa, A.W. Naji, "Development of an arabic text-to-speech system," in: International Conference on Computer and Communication Engineering (ICCCE), 2010.
13 I. A. Salim, [The syllabic structure in Arabic language] (in Arabic), Magazine of the Jordan Academy of Arabic 33, 1987, pp. 45-63.
14 M. Alghamdi, A. H. Alhamid, M. M. Aldasuqi, "Database of Arabic Sounds: Sentences," Technical Report, King Abdulaziz City of Science and Technology, Saudi Arabia, 2003. (In Arabic).
15 [Holy Qur'an]. [On-line]. Available: http://www.holyquran.net/quran/index.html [October 13, 2016].
16 M. al-Bukhari. [Sahih al-Bukhari] (in Arabic: " صَحِيحُُ اُلبُخَارِي "). [On-line]. Available: http://shamela.ws/browse.php/book-1681 [October 13, 2016].
17 S. Razi. [Nahj al-Balagha] (in Arabic: " نَهجُُ اُلبَلََغَة "). [On-line]. Available: http://ia600306.us.archive.org/7/items/98472389432/nhj-blagh-ali.pdf [October 13, 2016].
18 [The Mecca list of common vocabulary] (in Arabic: " قَائِمَةُُ مَُكَّةَُ لُِلمُفرَدَاتُِ اُلشَّائِعَة "). [On-line]. Available: http://daleel-ar.com/2016/09/08/ قائمة-مكة-للمفردات-الشائعة / [October 13, 2016].
19 K. Bobzin. [Arabic Basic Course] (in German: "Arabisch Grundkurs"). Wiesbaden, Germany: Harrassowitz Verlag, 2009.
Mr. Fadi Sindran
Friedrich-Alexander-Universität Erlangen-Nrnberg/Department of Computer Science 5 - Germany
Mr. Firas Mualla
Faculty of Engineering/Department of Computer Science /Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg - Germany
Dr. Tino Haderlein
Faculty of Engineering/Department of Computer Science /Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg - Germany
Professor Khaled Daqrouq
Department of Electrical and Computer Engineering King Abdulaziz University - Saudi Arabia
Professor Elmar Nöth
Faculty of Engineering/Department of Computer Science /Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg - Germany