Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

This is an Open Access publication published under CSC-OpenAccess Policy.
F0 Contour Modeling for Arabic Text-to-Speech Synthesis Using Fujisaki Parameters and Neural Networks
Zied Mnasri, Fatouma Boukadida, Noureddine Ellouze
Pages - 352 - 369     |    Revised - 31-01-2011     |    Published - 08-02-2011
Volume - 4   Issue - 6    |    Publication Date - January / February  Table of Contents
F0 Contour, Arabic TTS, Fujisaki Parameters, Neural Networks, Phrase Command, Accent Command
Speech synthesis quality depends on its naturalness and intelligibility. These abstract concepts are the concern of phonology. In terms of phonetics, they are transmitted by prosodic components, mainly the fundamental frequency (F0) contour. F0 contour modeling is performed either by setting rules or by investigating databases, with or without parameters and following a timely sequential path or a parallel and super-positional scheme. In this study, we opted to model the F0 contour for Arabic using the Fujisaki parameters to be trained by neural networks. Statistical evaluation was carried out to measure the predicted parameters accuracy and the synthesized F0 contour closeness to the natural one. Findings concerning the adoption of Fujisaki parameters to Arabic F0 contour modeling for text-to-speech synthesis were discussed. Keywords: F0 contour, Arabic TTS, Fujisaki parameters, neural networks, Phrase command, Accent command.
CITED BY (1)  
1 Ilyes, R., & Ben Ayed, Y. (2014, March). Statistical parametric speech synthesis for Arabic language using ANN. In Advanced Technologies for Signal and Image Processing (ATSIP), 2014 1st International Conference on (pp. 452-457). IEEE.
1 Google Scholar
2 CiteSeerX
3 refSeek
4 Socol@r
5 Scribd
6 WorldCat
7 SlideShare
8 PdfSR
1 M. Tatham, K. Morton, “Developments in speech synthesis”, John Wiley & Sons Inc. (2005)
2 H. Fujisaki, “Prosody, information and modeling with emphasis on tonal features of speech”,in Proceedings of Workshop on spoken language processing, ISCA-supported event,Mumbai, India, January 9-11, 2003
3 J. B. Pierrehumbert, “The phonology and phonetics of English intonation”, Ph. D. Thesis, MIT,Cambridge, 1980
4 H. Fujisaki, “Dynamic characteristics of voice fundamental frequency in speech and singing.Acoustical analysis and physiological interpretations“.STL-QPSR,1981, Vol. 22(1), pp 1-20,KTH, Sweden
5 S. Narusawa, N. Minematsu, K. Hirose and H. Fujisaki, “Automatic extraction of model parameters from fundamental frequency contours of English utterances”, in Proceedings of ICSP’2000, pp 1725-1728, Denver, Colorado, USA
6 H. Mixdorff, H. Fujisaki, G. P. Chen and Y. Hu, “Towards the automatic extraction of Fujisaki model parameters for Mandarin”, in Proceedings of Eurospeech’03, pp 873-976, Geneva,2003
7 M.Vainio, “Artificial Neural networks based prosody models for Finnish text-to-speech synthesis”, PhD. Thesis, Helsinki University of Technology, Finland, 2001
8 H.J. Mixdorff, “FujiParaEditor program”, Available at http://www.tfh-berlin.de/~mixdorff/
9 J. Buhmann, H. Vereecken, J. Fackrell, J. P. Martens and B. Van Coile, “Data driven intonation modeling of 6 languages”, in Proceedings of International conference on spoken language processing, October 2000, Beijing, China, Vol. 3, pp 179-183
10 K. S. Rao and B. Yegnanarayana, “Intonation modeling for Indian languages”, Computer speech and language Journal, Volume 23, pp 240-256, Elsevier, 2009
11 G. P. Giannopoulos and A. E. Chalamandaris, “An innovative F0 modeling approach for emphatic affirmative speech, applied to the Greek language”, in Speech Prosody 2006,Dresden, Germany
12 X. Sun, “F0 Generation for speech synthesis using a multi-tier approach”, in Proceedings of ICSLP’02, Denver, 2002, pp 2077-2080
13 H. Mixdorff, “An integrated approach to modeling German prosody”, Habilitation Thesis,Technical University of Dresden, Germany, 2002
14 H. Fujisaki and S. Ohno, “Prosodic parameterization of spoken Japanese based on a model of the generation process of F0 contours”, in Proceedings of ICSLP’96, vol 4, pp 2439-2442,Philadelphia, PA, USA, Oct. 1996.
15 B. Moebius, “Synthesizing German F0 contours”, in J. Van Santen, R. Spraot, J. Olive and J.Hirschberg, Progressin speech synthesis, Chapter 32, pp 401-416, Springer Verlag, New York, 1997
16 H. Fujisaki and K. Hirose, “Analysis of voice fundamental frequency contours for declarative sentences of Japanese”, in Journal of the acoustic society of Japan (E), 5(4), pp 233-241,1984
17 H. Mixdorff and O. Jokisch, “Building an integrated prosodic model of German”, in Proceedings of Eurospeech 2001, Aaloborg, Denmark, vo2, pp 947-950
18 H. Mixdorff and O. Jokisch, “Evaluating the quality of an integrated model of German prosody”’, International journal of speech technology, Vol 6, pp 45-55, 2003
19 D. Hirst, A. Di Cristo and R. Espesser , “Levels of representation and levels of analysis for inronation in M.Horne, Prosody: Theory and experiment”, Kluwer editions, Dortrecht, 2000
20 K. S. Rao and B. Yegnanarayana, “Intonation modeling for Indian languages”, in Proccedings of Interspeech’04,Jeju Island, K0rea, 4-8 October 2004, pp733-736
21 G. Sonntag, T. Portele and B. Heuft, “Prosody generation with a neural network: Weighing the importance of input parameters”, in Proceedings of ICASSP, pp 931-934, Munich,Germany, April 1997
22 J. P. Teixieira, D. Freitas and H. Fujisaki, “Prediction of Fujisaki model’s phrase commands”,in Proceedings of Eurospeech 2003, Geneva, pp 397-400
23 J. P. Teixiera, D. Freitas and H. Fujisaki, “Prediction of accent commands for the Fujisaki intonation model”, in Proceeding of Speech Prosody 2004, Nara, Japan, March 23-26, 2004,pp 451-454
24 J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities”, Proceedings of the National Academy of Sciences of the USA, vol. 79 no. 8 pp.2554-2558, April 1982
25 E. Navas, I. Hernaez, A. Armenta, B. Etxebarria and J. Salaberria, “Modeling Basque intonation using Fujisaki’s model and CARTS” , in Proccedings of ICSLP 2002, Denver, USA,pp 2409-2412
26 H. Mixdorff, “Intonation patterns of German-model-based quantitative analysis and synthesis of F0 contours”, Ph. D. Thesis, TU Dresden, 1998
27 H. Fujisaki, S. Ohno and S. Luksaneeyanawin, “Analysis and synthesis of F0 contours of Thai utterances based on the command-response model”, in Proceeding of 15th ICPhS, Barcelona,Spain, 2003, pp 1129- 1132
28 P. Taylor, “Analysis and synthesis of intonation using the Tilt model”, Journal of Accoustic society of America, No 107, pp 1697-1714, 2000
29 F. Boukadida, “Etude de la prosodie pour un système de synthèse de la parole Arabe standard à partir du texte”, Thèse de doctorat, Université Tunis El Manar, 2006.
30 Z. Mnasri, F. Boukadida and N. Ellouze, “Modelling segmen1tal durations by statistical learning for an Arabic TTS system”, International Revue on Computer and Software,September 2009
31 X. Sun, “SHR program” , available at http://mel.speech.nwu.edu/sunxj/pda.htm, Copyrightht © 2001, X.Sun, Department of communication sciences and disorders, Northwestern University,USA
32 P. Boersma and D. Weenink, “Praat: Doing phonetics by computer, version 4.4”, available at http://www.praat.org
33 A. Black and A. Hunt, “Generating F0 contours from ToBI labels using linear regression” ,in Proceedings of ICSLP, Philadelphia, Pennsylvania, 1996
34 K. Dusterhoff, A. Black and P. Taylor, “Using decision trees within the tilt intonation model to predict F0 contours”, in Proceedings of Eurospeech, Budapest, Hungary, 1999
35 S. Sakai and J. Glass, “Fundamental frequency modeling for corpus-based speech synthesis based on a statistical learning technique”, in Proceedings of IEEE ASRU 2003, Nov. 30-Dec.4, 2003, St. Thomas, US Virgin Islands, pp 712-717
36 H. Fujisaki and S. Ohno, “Analysis and modelling of fundamental frequency contours of English utterances”, in Proceedings of Eurospeech’95, pp 985-988, Madrid, Sep. 1995.
Mr. Zied Mnasri
- Tunisia
Mr. Fatouma Boukadida
- Tunisia
Mr. Noureddine Ellouze
- Tunisia