Home   >   CSC-OpenAccess Library   >    Manuscript Information
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Mohamed Elmahdy, Mark Hasegawa-Johnson, Eiman Mustafawi
Pages - 88 - 96     |    Revised - 15-11-2012     |    Published - 31-12-2012
Volume - 3   Issue - 1    |    Publication Date - October 2012  Table of Contents
MORE INFORMATION
KEYWORDS
Arabic, Acoustic Modeling, Pronunciation Modeling, Speech Recognition
ABSTRACT
In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount of non-vocalized resources is always higher than vocalized resources. Two speech recognition baseline systems were built: phonemic and graphemic. The two baseline acoustic models were fused together after two independent trainings to create a hybrid acoustic model. Pronunciation modeling was also hybrid by generating graphemic pronunciation variants as well as phonemic variants. Different techniques are proposed for pronunciation modeling to reduce model complexity. Experiments were conducted on large vocabulary news broadcast speech domain. The proposed hybrid approach has shown a relative reduction in WER of 8.8% to 12.6% based on pronunciation modeling settings and the supervision in the baseline systems.
CITED BY (2)  
1 Larcom, M. K. (2014). The Minimalist Machine: An Implementation of Arabic Structures and Syntax.
2 Christensen, H., Green, P. D., & Hain, T. (2013, August). Learning speaker-specific pronunciations of disordered speech. In Interspeech (pp. 1159-1163).
1 Google Scholar 
2 CiteSeerX 
3 Scribd 
4 SlideShare 
5 PdfSR 
A. Messaoudi, L. Lamel, and J. Gauvain, “Transcription of Arabic Broadcast News”. In International Conference on Spoken Language Processing (INTERSPEECH), pp. 1701-1704, 2004.
Carnegie Mellon University Sphinx, Speech Recognition Toolkit, http://cmusphinx. sourceforge.net/.
Carnegie Mellon University-Cambridge, CMU-Cambridge Statistical Language Modeling toolkit, http://www.speech.cs.cmu.edu/SLM/toolkit.html.
D. Huggins-Daines, M. Kumar, A. Chan, A. W Black, M. Ravishankar, and A I. Rudnicky, “Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices”, In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 185-188, 2006.
D. Vergyri and K. Kirchhoff, “Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition”, In proceedings of COLING Computational Approaches to Arabic Scriptbased Languages, pp. 66-73, 2004.
ELRA: European Language Resources Association, http://www.elra.info/.
H. Kuo, S. Chu, B. Kingsbury, G. Saon, H. Soltau, F. Biadsy, “The IBM 2011 GALE Arabic speech transcription system”, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 272- 277, 2011.
J. Billa, M. Noamany, A. Srivastava, D. Liu, R. Stone, J. Xu, J. Makhoul, and F. Kubala, “Audio indexing of Arabic broadcast news”, In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 5–8, 2002.
L. Lamel, A. Messaoudi, and J. Gauvain, “Automatic Speech-to-Text Transcription in Arabic”, ACM Transactions on Asian Language Information Processing (TALIP), 8(4), 2009.
M. Elmahdy, R. Gruhn, and W. Minker, “Novel Techniques for Dialectal Arabic Speech Recognition”, Springer, 2012.
M. Maamouri, D. Graff, C. Cieri, “Arabic Broadcast News Speech”, Linguistic Data Consortium(LDC), LDC Catalog No.: LDC2006S46, 2006.
N. Habash and O. Rambow, “Arabic Diacritization through Full Morphological Tagging”, Proceedings of NAACL HLT 2007, pp. 53-56, 2007.
N. Habash, “Introduction to Arabic Natural Language Processing”, Morgan and Claypool Publishers, 2010.
P. Clarkson, and R. Rosenfeld, “Statistical Language Modeling Using the CMU-Cambridge Toolkit”, In Proceedings of ISCA Eurospeech, 1997.
R. Parker, D. Graff, K. Chen, J. Kong, and K. Maeda, “Arabic Gigaword Fourth Edition”, Linguistic Data Consortium(LDC), LDC Catalog No.: LDC2009T30, 2009.
R. Sarikaya, O. Emam, I. Zitouni, and Y. Gao, “Maximum Entropy Modeling for Diacritization of Arabic Text”, In Proceedings of International Conference on Speech and Language Processing INTERSPEECH, pp. 145–148, 2006.
T. Buckwalter, “Buckwalter Arabic Morphological Analyzer Version 1.0”, Linguistic Data Consortium(LDC), LDC Catalog No.:LDC2002L49, 2002.
The Nemlar project, http://www.nemlar.org/.
Dr. Mohamed Elmahdy
Qatar University - Qatar
mohamed.elmahdy@qu.edu.qa
Associate Professor Mark Hasegawa-Johnson
University of Illinois - United States of America
Dr. Eiman Mustafawi
Qatar University - Qatar