Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition

Mohamed Elmahdy; Mark Hasegawa-Johnson; Eiman Mustafawi

Call for Papers - Ongoing round of submission, notification and publication.

Home | Login or Register | Contact CSC

Home > CSC-OpenAccess Library > Manuscript Information

Full Text Available
(no registration required)

(86.58KB)

-- CSC-OpenAccess Policy

-- Creative Commons Attribution NonCommercial 4.0 International License

>> COMPLETE LIST OF JOURNALS

EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition

Mohamed Elmahdy, Mark Hasegawa-Johnson, Eiman Mustafawi

Pages - 88 - 96 | Revised - 15-11-2012 | Published - 31-12-2012

Published in International Journal of Computational Linguistics (IJCL)

Volume - 3 Issue - 1 | Publication Date - October 2012 Table of Contents

MORE INFORMATION

References | Cited By (2) | Abstracting & Indexing

KEYWORDS

Arabic, Acoustic Modeling, Pronunciation Modeling, Speech Recognition

ABSTRACT

In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount of non-vocalized resources is always higher than vocalized resources. Two speech recognition baseline systems were built: phonemic and graphemic. The two baseline acoustic models were fused together after two independent trainings to create a hybrid acoustic model. Pronunciation modeling was also hybrid by generating graphemic pronunciation variants as well as phonemic variants. Different techniques are proposed for pronunciation modeling to reduce model complexity. Experiments were conducted on large vocabulary news broadcast speech domain. The proposed hybrid approach has shown a relative reduction in WER of 8.8% to 12.6% based on pronunciation modeling settings and the supervision in the baseline systems.

CITED BY (2)

1	Larcom, M. K. (2014). The Minimalist Machine: An Implementation of Arabic Structures and Syntax.

2	Christensen, H., Green, P. D., & Hain, T. (2013, August). Learning speaker-specific pronunciations of disordered speech. In Interspeech (pp. 1159-1163).

ABSTRACTING & INDEXING

1	Google Scholar

2	CiteSeerX

3	Scribd

4	SlideShare

5	PdfSR

REFERENCES

A. Messaoudi, L. Lamel, and J. Gauvain, “Transcription of Arabic Broadcast News”. In International Conference on Spoken Language Processing (INTERSPEECH), pp. 1701-1704, 2004.

Carnegie Mellon University Sphinx, Speech Recognition Toolkit, http://cmusphinx. sourceforge.net/.

Carnegie Mellon University-Cambridge, CMU-Cambridge Statistical Language Modeling toolkit, http://www.speech.cs.cmu.edu/SLM/toolkit.html.

D. Huggins-Daines, M. Kumar, A. Chan, A. W Black, M. Ravishankar, and A I. Rudnicky, “Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices”, In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 185-188, 2006.

D. Vergyri and K. Kirchhoff, “Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition”, In proceedings of COLING Computational Approaches to Arabic Scriptbased Languages, pp. 66-73, 2004.

ELRA: European Language Resources Association, http://www.elra.info/.

H. Kuo, S. Chu, B. Kingsbury, G. Saon, H. Soltau, F. Biadsy, “The IBM 2011 GALE Arabic speech transcription system”, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 272- 277, 2011.

J. Billa, M. Noamany, A. Srivastava, D. Liu, R. Stone, J. Xu, J. Makhoul, and F. Kubala, “Audio indexing of Arabic broadcast news”, In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 5–8, 2002.

L. Lamel, A. Messaoudi, and J. Gauvain, “Automatic Speech-to-Text Transcription in Arabic”, ACM Transactions on Asian Language Information Processing (TALIP), 8(4), 2009.

M. Elmahdy, R. Gruhn, and W. Minker, “Novel Techniques for Dialectal Arabic Speech Recognition”, Springer, 2012.

M. Maamouri, D. Graff, C. Cieri, “Arabic Broadcast News Speech”, Linguistic Data Consortium(LDC), LDC Catalog No.: LDC2006S46, 2006.

N. Habash and O. Rambow, “Arabic Diacritization through Full Morphological Tagging”, Proceedings of NAACL HLT 2007, pp. 53-56, 2007.

N. Habash, “Introduction to Arabic Natural Language Processing”, Morgan and Claypool Publishers, 2010.

P. Clarkson, and R. Rosenfeld, “Statistical Language Modeling Using the CMU-Cambridge Toolkit”, In Proceedings of ISCA Eurospeech, 1997.

R. Parker, D. Graff, K. Chen, J. Kong, and K. Maeda, “Arabic Gigaword Fourth Edition”, Linguistic Data Consortium(LDC), LDC Catalog No.: LDC2009T30, 2009.

R. Sarikaya, O. Emam, I. Zitouni, and Y. Gao, “Maximum Entropy Modeling for Diacritization of Arabic Text”, In Proceedings of International Conference on Speech and Language Processing INTERSPEECH, pp. 145–148, 2006.

T. Buckwalter, “Buckwalter Arabic Morphological Analyzer Version 1.0”, Linguistic Data Consortium(LDC), LDC Catalog No.:LDC2002L49, 2002.

The Nemlar project, http://www.nemlar.org/.

MANUSCRIPT AUTHORS

Dr. Mohamed Elmahdy

Qatar University - Qatar

mohamed.elmahdy@qu.edu.qa

Associate Professor Mark Hasegawa-Johnson

University of Illinois - United States of America

Dr. Eiman Mustafawi

Qatar University - Qatar

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS