Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

(970.45KB)
This is an Open Access publication published under CSC-OpenAccess Policy.
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Fadi Sindran, Firas Mualla, Tino Haderlein, Khaled Daqrouq, Elmar Nöth
Pages - 38 - 53     |    Revised - 30-11-2016     |    Published - 31-12-2016
Volume - 7   Issue - 2    |    Publication Date - December 2016  Table of Contents
MORE INFORMATION
KEYWORDS
Statistical Studies, Standard Arabic, Phonetic Transcription, Phonetization, Ranked Frequency Distribution, Phonemes, Allophones, Syllables, Allosyllables, Fit of Equation.
ABSTRACT
Statistical studies based on automatic phonetic transcription of Standard Arabic texts are rare, and even though studies have been performed, they have been done only on one level - phoneme or syllable - and the results cannot be generalized on the language as a whole. In this paper we automatically derived accurate statistical information about phonemes, allophones, syllables, and allosyllables in Standard Arabic. A corpus of more than 5 million words, including words and sentences from both Classical Arabic and Modern Standard Arabic, has been prepared and preprocessed. We developed a software package to accomplish a rule-based automatic transcription from written Standard Arabic text to the corresponding linguistic units at four levels: phoneme, allophone, syllable, and allosyllable. After testing the software on four corpora including more than 57000 vocabulary words, and achieving a very high accuracy (> 99 %) on the four levels, we used this software as a reliable tool for the automatic transcription of the corpus used in this paper and evaluated the following: 1) the vocabulary phonemes, allophones, syllables, and allosyllables with their specific percentages in Standard Arabic. 2) the best curve equation from the distribution of phonemes, allophones, syllables, and allosyllables normalized frequencies. 3) important statistical information, such as percentage of consonants and vowels, percentage of the consonants classified by the place and way of articulation, the transition probability matrix between phonemes, and percentages of syllables according to the type of syllable, etc.
CITED BY (0)  
1 CiteSeerX
2 Scribd
3 SlideShare
4 PdfSR
1 A. Masmoudi, M. Ellouze Khemakhem, Y. Estéve, L. Hadrich Belguith, N. Habash, "A corpus and phonetic dictionary for Tunisian Arabic speech recognition," in: LREC, 2014, pp. 306-310.
2 S. Harrat, M. Abbas, K. Meftouh, K. Smaili, "Diacritics restoration for Arabic dialects," in: 14th Annual Conference of the International Speech Communication Association (Interspeech), 2013, pp. 1429-1433.
3 F. Sindran, F. Mualla, T. Haderlein, K. Daqrouq, E. Nöth.G. "Rule-Based Standard Arabic Phonetization at Phoneme, Allophone, and Syllable Level." International Journal of Computational Linguistics (IJCL), vol. 7, pp. 23-37, Dec. 2016.
4 D.M.W. Powers, "Applications and explanations of Zipf's law". Association for Computational Linguistics, 1998, pp. 151-160.
5 A. Lüdeling, M. Kytö, Eds.: "Corpus linguistics: an international handbook". Berlin, Mouton de Gruyter, 2008. Vol. 2, pp. 803-821.
6 A. H. Moussa, [Computerization of the Arab heritage] (in Arabic: ุญูŽูˆุณูŽุจูŽุฉู ุงู„ุชู‘ูุฑูŽุงุซู ุงู„ุนูŽุฑูŽุจููŠ). Internet: http://majma.org.jo/res/seasons/19/19-1.pdf, [October 15, 2016].
7 I. AbuSalim, [The syllabic structure in Arabic language] (in Arabic: ุงู„ุจูู†ูŠูŽุฉู ุงู„ู…ูŽู‚ุทูŽุนููŠู‘ูŽุฉู ูููŠ ุงู„ู„ู‘ูุบูŽุฉู ุงู„ุนูŽุฑูŽุจููŠู‘ูŽุฉ), Magazine of the Jordan Academy of Arabic 33 (1987), pp. 45โ€“63.
8 A. A.-R. A. Ibrahim, [The syllable system in Surat al-Baqara] (in Arabic), Master's thesis, Arabic Department, Faculty of Arts, Islamic University Gaza, Palestine (2006).
9 Y. Tambovtsev, C. Martindale, "Phoneme frequencies follow a yule distribution," SKASE Journal of Theoretical Linguistics 4 (2007), pp. 1-11.
10 M. Elshafei, H. Al-Muhtaseb, M. Alghamdi, "Statistical methods for automatic diacritization of Arabic text," in: The Saudi 18th National Computer Conference. Riyadh, 2006.
11 [Holy Qurโ€™an] (in Arabic: โ€œุงู„ู‚ูุฑุขูŽู†ู ุงู„ูƒูŽุฑููŠู…โ€). [On-line]. Available: http://www.holyquran.net/quran/index.html [October 13, 2016].
12 [Holy Bible] (in Arabic: โ€œุงู„ูƒูุชูŽุงุจู ุงู„ู…ูู‚ูŽุฏู‘ูŽุณโ€). [On-line]. Available: http://ar.arabicbible.com/arabic-bible/word.html [October 13, 2016].
13 S. Razi. [Nahj al-Balagha] (in Arabic: โ€œู†ูŽู‡ุฌู ุงู„ุจูŽู„ูŽุงุบูŽุฉโ€). [On-line]. Available: http://ia600306.us.archive.org/7/items/98472389432/nhj-blagh-ali.pdf [October 13, 2016].
14 M. al-Bukhari. [Sahih al-Bukhari] (in Arabic: โ€œุตูŽุญููŠุญู ุงู„ุจูุฎูŽุงุฑููŠโ€). [On-line]. Available: http://shamela.ws/browse.php/book-1681 [October 13, 2016].
15 A. al-Shaizari. [Nihayet al Rutba fi Talab al-Hisba] (in Arabic: โ€œู†ูู‡ูŽุงูŠูŽุฉู ุงู„ุฑู‘ูุชุจูŽุฉู ูููŠ ุทูŽู„ูŽุจู ุงู„ุญูุณุจูŽุฉโ€). [On-line]. Available: http://shamela.ws/browse.php/book-21584 [October 13, 2016].
16 I. al-Haytami. [Tuhfatuโ€™l Muhtaj fi Sharh Al-Minhaj] (in Arabic: โ€œุชูุญููŽุฉู ุงู„ู…ูุญุชูŽุงุฌู ูููŠ ุดูŽุฑุญู ุงู„ู…ูู†ู‡ูŽุงุฌูโ€). [On-line]. Available: http://shamela.ws/browse.php/book-9059 [October 13, 2016].
17 M. Alghamdi, A. H. Alhamid, M. M. Aldasuqi, "Database of Arabic Sounds: Sentences," Technical Report, King Abdulaziz City of Science and Technology, Saudi Arabia, 2003. (In Arabic).
18 K. Bobzin. [Arabic Basic Course] (in German: "Arabisch Grundkurs"). Wiesbaden, Germany: Harrassowitz Verlag, 2009.
19 [The Mecca list of common vocabulary] (in Arabic: โ€œู‚ูŽุงุฆูู…ูŽุฉู ู…ูŽูƒู‘ูŽุฉูŽ ู„ูู„ู…ููุฑูŽุฏูŽุงุชู ุงู„ุดู‘ูŽุงุฆูุนูŽุฉโ€). [On-line]. Available: http://daleel-ar.com/2016/09/08/ู‚ุงุฆู…ุฉ-ู…ูƒุฉ-ู„ู„ู…ูุฑุฏุงุช-ุงู„ุดุงุฆุนุฉ/ [October 13, 2016].
20 Arpabet, Internet: https://en.wikipedia.org/wiki/Arpabet [October 23, 2016].
21 M. Alghamdi, Y. O. M. El Hadj, M. Alkanhal, "A manual system to segment and transcribe Arabic speech," in: IEEE International Conference on Signal Processing and Communications (ICSPC), 2007, pp. 233-236.
22 Evaluating Goodness of Fit. Internet: https://de.mathworks.com/help/curvefit/evaluating-goodness-of-fit.html?requestedDomain=www.mathworks.com [October 26, 2016].
23 M. Zeki, O.O. Khalifa, A.W. Naji, "Development of an arabic text-to-speech system," in: International Conference on Computer and Communication Engineering (ICCCE), 2010.
Mr. Fadi Sindran
Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany - Germany
fadi.sindran@faui51.informatik.uni-erlangen.de
Mr. Firas Mualla
Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany - Germany
Dr. Tino Haderlein
Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany - Germany
Professor Khaled Daqrouq
Department of Electrical and Computer Engineering King Abdulaziz University, Jeddah, 22254, Saudi Arabia - Saudi Arabia
Professor Elmar Nöth
Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany - Germany