Home   >   CSC-OpenAccess Library   >    Manuscript Information
F0 Contour Modeling for Arabic Text-to-Speech Synthesis Using Fujisaki Parameters and Neural Networks
Zied Mnasri, Fatouma Boukadida, Noureddine Ellouze
Pages - 352 - 369     |    Revised - 31-01-2011     |    Published - 08-02-2011
Volume - 4   Issue - 6    |    Publication Date - January / February  Table of Contents
MORE INFORMATION
KEYWORDS
F0 Contour, Arabic TTS, Fujisaki Parameters, Neural Networks, Phrase Command, Accent Command
ABSTRACT
Speech synthesis quality depends on its naturalness and intelligibility. These abstract concepts are the concern of phonology. In terms of phonetics, they are transmitted by prosodic components, mainly the fundamental frequency (F0) contour. F0 contour modeling is performed either by setting rules or by investigating databases, with or without parameters and following a timely sequential path or a parallel and super-positional scheme. In this study, we opted to model the F0 contour for Arabic using the Fujisaki parameters to be trained by neural networks. Statistical evaluation was carried out to measure the predicted parameters accuracy and the synthesized F0 contour closeness to the natural one. Findings concerning the adoption of Fujisaki parameters to Arabic F0 contour modeling for text-to-speech synthesis were discussed. Keywords: F0 contour, Arabic TTS, Fujisaki parameters, neural networks, Phrase command, Accent command.
CITED BY (1)  
1 Ilyes, R., & Ben Ayed, Y. (2014, March). Statistical parametric speech synthesis for Arabic language using ANN. In Advanced Technologies for Signal and Image Processing (ATSIP), 2014 1st International Conference on (pp. 452-457). IEEE.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Socol@r  
5 Scribd 
6 WorldCat 
7 SlideShare 
8 PdfSR 
A. Black and A. Hunt, “Generating F0 contours from ToBI labels using linear regression” ,in Proceedings of ICSLP, Philadelphia, Pennsylvania, 1996
B. Moebius, “Synthesizing German F0 contours”, in J. Van Santen, R. Spraot, J. Olive and J.Hirschberg, Progressin speech synthesis, Chapter 32, pp 401-416, Springer Verlag, New York, 1997
D. Hirst, A. Di Cristo and R. Espesser , “Levels of representation and levels of analysis for inronation in M.Horne, Prosody: Theory and experiment”, Kluwer editions, Dortrecht, 2000
E. Navas, I. Hernaez, A. Armenta, B. Etxebarria and J. Salaberria, “Modeling Basque intonation using Fujisaki’s model and CARTS” , in Proccedings of ICSLP 2002, Denver, USA,pp 2409-2412
F. Boukadida, “Etude de la prosodie pour un système de synthèse de la parole Arabe standard à partir du texte”, Thèse de doctorat, Université Tunis El Manar, 2006.
G. P. Giannopoulos and A. E. Chalamandaris, “An innovative F0 modeling approach for emphatic affirmative speech, applied to the Greek language”, in Speech Prosody 2006,Dresden, Germany
G. Sonntag, T. Portele and B. Heuft, “Prosody generation with a neural network: Weighing the importance of input parameters”, in Proceedings of ICASSP, pp 931-934, Munich,Germany, April 1997
H. Fujisaki and K. Hirose, “Analysis of voice fundamental frequency contours for declarative sentences of Japanese”, in Journal of the acoustic society of Japan (E), 5(4), pp 233-241,1984
H. Fujisaki and S. Ohno, “Analysis and modelling of fundamental frequency contours of English utterances”, in Proceedings of Eurospeech’95, pp 985-988, Madrid, Sep. 1995.
H. Fujisaki and S. Ohno, “Prosodic parameterization of spoken Japanese based on a model of the generation process of F0 contours”, in Proceedings of ICSLP’96, vol 4, pp 2439-2442,Philadelphia, PA, USA, Oct. 1996.
H. Fujisaki, “Dynamic characteristics of voice fundamental frequency in speech and singing.Acoustical analysis and physiological interpretations“.STL-QPSR,1981, Vol. 22(1), pp 1-20,KTH, Sweden
H. Fujisaki, “Prosody, information and modeling with emphasis on tonal features of speech”,in Proceedings of Workshop on spoken language processing, ISCA-supported event,Mumbai, India, January 9-11, 2003
H. Fujisaki, S. Ohno and S. Luksaneeyanawin, “Analysis and synthesis of F0 contours of Thai utterances based on the command-response model”, in Proceeding of 15th ICPhS, Barcelona,Spain, 2003, pp 1129- 1132
H. Mixdorff and O. Jokisch, “Building an integrated prosodic model of German”, in Proceedings of Eurospeech 2001, Aaloborg, Denmark, vo2, pp 947-950
H. Mixdorff and O. Jokisch, “Evaluating the quality of an integrated model of German prosody”’, International journal of speech technology, Vol 6, pp 45-55, 2003
H. Mixdorff, “An integrated approach to modeling German prosody”, Habilitation Thesis,Technical University of Dresden, Germany, 2002
H. Mixdorff, “Intonation patterns of German-model-based quantitative analysis and synthesis of F0 contours”, Ph. D. Thesis, TU Dresden, 1998
H. Mixdorff, H. Fujisaki, G. P. Chen and Y. Hu, “Towards the automatic extraction of Fujisaki model parameters for Mandarin”, in Proceedings of Eurospeech’03, pp 873-976, Geneva,2003
H.J. Mixdorff, “FujiParaEditor program”, Available at http://www.tfh-berlin.de/~mixdorff/
J. B. Pierrehumbert, “The phonology and phonetics of English intonation”, Ph. D. Thesis, MIT,Cambridge, 1980
J. Buhmann, H. Vereecken, J. Fackrell, J. P. Martens and B. Van Coile, “Data driven intonation modeling of 6 languages”, in Proceedings of International conference on spoken language processing, October 2000, Beijing, China, Vol. 3, pp 179-183
J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities”, Proceedings of the National Academy of Sciences of the USA, vol. 79 no. 8 pp.2554-2558, April 1982
J. P. Teixieira, D. Freitas and H. Fujisaki, “Prediction of Fujisaki model’s phrase commands”,in Proceedings of Eurospeech 2003, Geneva, pp 397-400
J. P. Teixiera, D. Freitas and H. Fujisaki, “Prediction of accent commands for the Fujisaki intonation model”, in Proceeding of Speech Prosody 2004, Nara, Japan, March 23-26, 2004,pp 451-454
K. Dusterhoff, A. Black and P. Taylor, “Using decision trees within the tilt intonation model to predict F0 contours”, in Proceedings of Eurospeech, Budapest, Hungary, 1999
K. S. Rao and B. Yegnanarayana, “Intonation modeling for Indian languages”, Computer speech and language Journal, Volume 23, pp 240-256, Elsevier, 2009
K. S. Rao and B. Yegnanarayana, “Intonation modeling for Indian languages”, in Proccedings of Interspeech’04,Jeju Island, K0rea, 4-8 October 2004, pp733-736
M. Tatham, K. Morton, “Developments in speech synthesis”, John Wiley & Sons Inc. (2005)
M.Vainio, “Artificial Neural networks based prosody models for Finnish text-to-speech synthesis”, PhD. Thesis, Helsinki University of Technology, Finland, 2001
P. Boersma and D. Weenink, “Praat: Doing phonetics by computer, version 4.4”, available at http://www.praat.org
P. Taylor, “Analysis and synthesis of intonation using the Tilt model”, Journal of Accoustic society of America, No 107, pp 1697-1714, 2000
S. Narusawa, N. Minematsu, K. Hirose and H. Fujisaki, “Automatic extraction of model parameters from fundamental frequency contours of English utterances”, in Proceedings of ICSP’2000, pp 1725-1728, Denver, Colorado, USA
S. Sakai and J. Glass, “Fundamental frequency modeling for corpus-based speech synthesis based on a statistical learning technique”, in Proceedings of IEEE ASRU 2003, Nov. 30-Dec.4, 2003, St. Thomas, US Virgin Islands, pp 712-717
X. Sun, “F0 Generation for speech synthesis using a multi-tier approach”, in Proceedings of ICSLP’02, Denver, 2002, pp 2077-2080
X. Sun, “SHR program” , available at http://mel.speech.nwu.edu/sunxj/pda.htm, Copyrightht © 2001, X.Sun, Department of communication sciences and disorders, Northwestern University,USA
Z. Mnasri, F. Boukadida and N. Ellouze, “Modelling segmen1tal durations by statistical learning for an Arabic TTS system”, International Revue on Computer and Software,September 2009
Mr. Zied Mnasri
- Tunisia
zied.mnasri@gmail.com
Mr. Fatouma Boukadida
- Tunisia
Mr. Noureddine Ellouze
- Tunisia


CREATE AUTHOR ACCOUNT
 
LAUNCH YOUR SPECIAL ISSUE
View all special issues >>
 
PUBLICATION VIDEOS