Home   >   CSC-OpenAccess Library   >    Manuscript Information
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Fadi Sindran, Firas Mualla, Tino Haderlein, Khaled Daqrouq, Elmar Nöth
Pages - 38 - 53     |    Revised - 30-11-2016     |    Published - 31-12-2016
Volume - 7   Issue - 2    |    Publication Date - December 2016  Table of Contents
MORE INFORMATION
KEYWORDS
Statistical Studies, Standard Arabic, Phonetic Transcription, Phonetization, Ranked Frequency Distribution, Phonemes, Allophones, Syllables, Allosyllables, Fit of Equation.
ABSTRACT
Statistical studies based on automatic phonetic transcription of Standard Arabic texts are rare, and even though studies have been performed, they have been done only on one level - phoneme or syllable - and the results cannot be generalized on the language as a whole. In this paper we automatically derived accurate statistical information about phonemes, allophones, syllables, and allosyllables in Standard Arabic. A corpus of more than 5 million words, including words and sentences from both Classical Arabic and Modern Standard Arabic, has been prepared and preprocessed. We developed a software package to accomplish a rule-based automatic transcription from written Standard Arabic text to the corresponding linguistic units at four levels: phoneme, allophone, syllable, and allosyllable. After testing the software on four corpora including more than 57000 vocabulary words, and achieving a very high accuracy (> 99 %) on the four levels, we used this software as a reliable tool for the automatic transcription of the corpus used in this paper and evaluated the following: 1) the vocabulary phonemes, allophones, syllables, and allosyllables with their specific percentages in Standard Arabic. 2) the best curve equation from the distribution of phonemes, allophones, syllables, and allosyllables normalized frequencies. 3) important statistical information, such as percentage of consonants and vowels, percentage of the consonants classified by the place and way of articulation, the transition probability matrix between phonemes, and percentages of syllables according to the type of syllable, etc.
1 CiteSeerX 
2 Scribd 
3 SlideShare 
4 PdfSR 
A. A.-R. A. Ibrahim, [The syllable system in Surat al-Baqara] (in Arabic), Master's thesis, Arabic Department, Faculty of Arts, Islamic University Gaza, Palestine (2006).
A. al-Shaizari. [Nihayet al Rutba fi Talab al-Hisba] (in Arabic: “نِهَايَةُ الرُّتبَةِ فِي طَلَبِ الحِسبَة”). [On-line]. Available: http://shamela.ws/browse.php/book-21584 [October 13, 2016].
A. H. Moussa, [Computerization of the Arab heritage] (in Arabic: حَوسَبَةُ التُّرَاثِ العَرَبِي). Internet: http://majma.org.jo/res/seasons/19/19-1.pdf, [October 15, 2016].
A. Lüdeling, M. Kytö, Eds.: "Corpus linguistics: an international handbook". Berlin, Mouton de Gruyter, 2008. Vol. 2, pp. 803-821.
A. Masmoudi, M. Ellouze Khemakhem, Y. Estéve, L. Hadrich Belguith, N. Habash, "A corpus and phonetic dictionary for Tunisian Arabic speech recognition," in: LREC, 2014, pp. 306-310.
Arpabet, Internet: https://en.wikipedia.org/wiki/Arpabet [October 23, 2016].
D.M.W. Powers, "Applications and explanations of Zipf's law". Association for Computational Linguistics, 1998, pp. 151-160.
Evaluating Goodness of Fit. Internet: https://de.mathworks.com/help/curvefit/evaluating-goodness-of-fit.html?requestedDomain=www.mathworks.com [October 26, 2016].
F. Sindran, F. Mualla, T. Haderlein, K. Daqrouq, E. Nöth.G. "Rule-Based Standard Arabic Phonetization at Phoneme, Allophone, and Syllable Level." International Journal of Computational Linguistics (IJCL), vol. 7, pp. 23-37, Dec. 2016.
I. AbuSalim, [The syllabic structure in Arabic language] (in Arabic: البُنيَةُ المَقطَعِيَّةُ فِي اللُّغَةِ العَرَبِيَّة), Magazine of the Jordan Academy of Arabic 33 (1987), pp. 45–63.
I. al-Haytami. [Tuhfatu’l Muhtaj fi Sharh Al-Minhaj] (in Arabic: “تُحفَةُ المُحتَاجِ فِي شَرحِ المِنهَاجِ”). [On-line]. Available: http://shamela.ws/browse.php/book-9059 [October 13, 2016].
K. Bobzin. [Arabic Basic Course] (in German: "Arabisch Grundkurs"). Wiesbaden, Germany: Harrassowitz Verlag, 2009.
M. al-Bukhari. [Sahih al-Bukhari] (in Arabic: “صَحِيحُ البُخَارِي”). [On-line]. Available: http://shamela.ws/browse.php/book-1681 [October 13, 2016].
M. Alghamdi, A. H. Alhamid, M. M. Aldasuqi, "Database of Arabic Sounds: Sentences," Technical Report, King Abdulaziz City of Science and Technology, Saudi Arabia, 2003. (In Arabic).
M. Alghamdi, Y. O. M. El Hadj, M. Alkanhal, "A manual system to segment and transcribe Arabic speech," in: IEEE International Conference on Signal Processing and Communications (ICSPC), 2007, pp. 233-236.
M. Elshafei, H. Al-Muhtaseb, M. Alghamdi, "Statistical methods for automatic diacritization of Arabic text," in: The Saudi 18th National Computer Conference. Riyadh, 2006.
M. Zeki, O.O. Khalifa, A.W. Naji, "Development of an arabic text-to-speech system," in: International Conference on Computer and Communication Engineering (ICCCE), 2010.
S. Harrat, M. Abbas, K. Meftouh, K. Smaili, "Diacritics restoration for Arabic dialects," in: 14th Annual Conference of the International Speech Communication Association (Interspeech), 2013, pp. 1429-1433.
S. Razi. [Nahj al-Balagha] (in Arabic: “نَهجُ البَلَاغَة”). [On-line]. Available: http://ia600306.us.archive.org/7/items/98472389432/nhj-blagh-ali.pdf [October 13, 2016].
Y. Tambovtsev, C. Martindale, "Phoneme frequencies follow a yule distribution," SKASE Journal of Theoretical Linguistics 4 (2007), pp. 1-11.
[Holy Bible] (in Arabic: “الكِتَابُ المُقَدَّس”). [On-line]. Available: http://ar.arabicbible.com/arabic-bible/word.html [October 13, 2016].
[Holy Qur’an] (in Arabic: “القُرآَنُ الكَرِيم”). [On-line]. Available: http://www.holyquran.net/quran/index.html [October 13, 2016].
[The Mecca list of common vocabulary] (in Arabic: “قَائِمَةُ مَكَّةَ لِلمُفرَدَاتِ الشَّائِعَة”). [On-line]. Available: http://daleel-ar.com/2016/09/08/قائمة-مكة-للمفردات-الشائعة/ [October 13, 2016].
Mr. Fadi Sindran
Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany - Germany
fadi.sindran@faui51.informatik.uni-erlangen.de
Mr. Firas Mualla
Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany - Germany
Dr. Tino Haderlein
Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany - Germany
Professor Khaled Daqrouq
Department of Electrical and Computer Engineering King Abdulaziz University, Jeddah, 22254, Saudi Arabia - Saudi Arabia
Professor Elmar Nöth
Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany - Germany