F0 Contour Modeling for Arabic Text-to-Speech Synthesis Using Fujisaki Parameters and Neural Networks

Zied Mnasri; Fatouma Boukadida; Noureddine Ellouze

Call for Papers - Ongoing round of submission, notification and publication.

Home | Login or Register | Contact CSC

Home > CSC-OpenAccess Library > Manuscript Information

Full Text Available
(no registration required)

(178.12KB)

-- CSC-OpenAccess Policy

-- Creative Commons Attribution NonCommercial 4.0 International License

>> COMPLETE LIST OF JOURNALS

EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

F0 Contour Modeling for Arabic Text-to-Speech Synthesis Using Fujisaki Parameters and Neural Networks

Zied Mnasri, Fatouma Boukadida, Noureddine Ellouze

Pages - 352 - 369 | Revised - 31-01-2011 | Published - 08-02-2011

Published in Signal Processing: An International Journal (SPIJ)

Volume - 4 Issue - 6 | Publication Date - January / February Table of Contents

MORE INFORMATION

References | Cited By (1) | Abstracting & Indexing

KEYWORDS

F0 Contour, Arabic TTS, Fujisaki Parameters, Neural Networks, Phrase Command, Accent Command

ABSTRACT

Speech synthesis quality depends on its naturalness and intelligibility. These abstract concepts are the concern of phonology. In terms of phonetics, they are transmitted by prosodic components, mainly the fundamental frequency (F0) contour. F0 contour modeling is performed either by setting rules or by investigating databases, with or without parameters and following a timely sequential path or a parallel and super-positional scheme. In this study, we opted to model the F0 contour for Arabic using the Fujisaki parameters to be trained by neural networks. Statistical evaluation was carried out to measure the predicted parameters accuracy and the synthesized F0 contour closeness to the natural one. Findings concerning the adoption of Fujisaki parameters to Arabic F0 contour modeling for text-to-speech synthesis were discussed. Keywords: F0 contour, Arabic TTS, Fujisaki parameters, neural networks, Phrase command, Accent command.

CITED BY (1)

1	Ilyes, R., & Ben Ayed, Y. (2014, March). Statistical parametric speech synthesis for Arabic language using ANN. In Advanced Technologies for Signal and Image Processing (ATSIP), 2014 1st International Conference on (pp. 452-457). IEEE.

ABSTRACTING & INDEXING

1	Google Scholar

2	CiteSeerX

3	refSeek

4	Socol@r

5	Scribd

6	WorldCat

7	SlideShare

8	PdfSR

REFERENCES

A. Black and A. Hunt, “Generating F0 contours from ToBI labels using linear regression” ,in Proceedings of ICSLP, Philadelphia, Pennsylvania, 1996

B. Moebius, “Synthesizing German F0 contours”, in J. Van Santen, R. Spraot, J. Olive and J.Hirschberg, Progressin speech synthesis, Chapter 32, pp 401-416, Springer Verlag, New York, 1997

D. Hirst, A. Di Cristo and R. Espesser , “Levels of representation and levels of analysis for inronation in M.Horne, Prosody: Theory and experiment”, Kluwer editions, Dortrecht, 2000

E. Navas, I. Hernaez, A. Armenta, B. Etxebarria and J. Salaberria, “Modeling Basque intonation using Fujisaki’s model and CARTS” , in Proccedings of ICSLP 2002, Denver, USA,pp 2409-2412

F. Boukadida, “Etude de la prosodie pour un système de synthèse de la parole Arabe standard à partir du texte”, Thèse de doctorat, Université Tunis El Manar, 2006.

G. P. Giannopoulos and A. E. Chalamandaris, “An innovative F0 modeling approach for emphatic affirmative speech, applied to the Greek language”, in Speech Prosody 2006,Dresden, Germany

G. Sonntag, T. Portele and B. Heuft, “Prosody generation with a neural network: Weighing the importance of input parameters”, in Proceedings of ICASSP, pp 931-934, Munich,Germany, April 1997

H. Fujisaki and K. Hirose, “Analysis of voice fundamental frequency contours for declarative sentences of Japanese”, in Journal of the acoustic society of Japan (E), 5(4), pp 233-241,1984

H. Fujisaki and S. Ohno, “Analysis and modelling of fundamental frequency contours of English utterances”, in Proceedings of Eurospeech’95, pp 985-988, Madrid, Sep. 1995.

H. Fujisaki and S. Ohno, “Prosodic parameterization of spoken Japanese based on a model of the generation process of F0 contours”, in Proceedings of ICSLP’96, vol 4, pp 2439-2442,Philadelphia, PA, USA, Oct. 1996.

H. Fujisaki, “Dynamic characteristics of voice fundamental frequency in speech and singing.Acoustical analysis and physiological interpretations“.STL-QPSR,1981, Vol. 22(1), pp 1-20,KTH, Sweden

H. Fujisaki, “Prosody, information and modeling with emphasis on tonal features of speech”,in Proceedings of Workshop on spoken language processing, ISCA-supported event,Mumbai, India, January 9-11, 2003

H. Fujisaki, S. Ohno and S. Luksaneeyanawin, “Analysis and synthesis of F0 contours of Thai utterances based on the command-response model”, in Proceeding of 15th ICPhS, Barcelona,Spain, 2003, pp 1129- 1132

H. Mixdorff and O. Jokisch, “Building an integrated prosodic model of German”, in Proceedings of Eurospeech 2001, Aaloborg, Denmark, vo2, pp 947-950

H. Mixdorff and O. Jokisch, “Evaluating the quality of an integrated model of German prosody”’, International journal of speech technology, Vol 6, pp 45-55, 2003

H. Mixdorff, “An integrated approach to modeling German prosody”, Habilitation Thesis,Technical University of Dresden, Germany, 2002

H. Mixdorff, “Intonation patterns of German-model-based quantitative analysis and synthesis of F0 contours”, Ph. D. Thesis, TU Dresden, 1998

H. Mixdorff, H. Fujisaki, G. P. Chen and Y. Hu, “Towards the automatic extraction of Fujisaki model parameters for Mandarin”, in Proceedings of Eurospeech’03, pp 873-976, Geneva,2003

H.J. Mixdorff, “FujiParaEditor program”, Available at http://www.tfh-berlin.de/~mixdorff/

J. B. Pierrehumbert, “The phonology and phonetics of English intonation”, Ph. D. Thesis, MIT,Cambridge, 1980

J. Buhmann, H. Vereecken, J. Fackrell, J. P. Martens and B. Van Coile, “Data driven intonation modeling of 6 languages”, in Proceedings of International conference on spoken language processing, October 2000, Beijing, China, Vol. 3, pp 179-183

J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities”, Proceedings of the National Academy of Sciences of the USA, vol. 79 no. 8 pp.2554-2558, April 1982

J. P. Teixieira, D. Freitas and H. Fujisaki, “Prediction of Fujisaki model’s phrase commands”,in Proceedings of Eurospeech 2003, Geneva, pp 397-400

J. P. Teixiera, D. Freitas and H. Fujisaki, “Prediction of accent commands for the Fujisaki intonation model”, in Proceeding of Speech Prosody 2004, Nara, Japan, March 23-26, 2004,pp 451-454

K. Dusterhoff, A. Black and P. Taylor, “Using decision trees within the tilt intonation model to predict F0 contours”, in Proceedings of Eurospeech, Budapest, Hungary, 1999

K. S. Rao and B. Yegnanarayana, “Intonation modeling for Indian languages”, Computer speech and language Journal, Volume 23, pp 240-256, Elsevier, 2009

K. S. Rao and B. Yegnanarayana, “Intonation modeling for Indian languages”, in Proccedings of Interspeech’04,Jeju Island, K0rea, 4-8 October 2004, pp733-736

M. Tatham, K. Morton, “Developments in speech synthesis”, John Wiley & Sons Inc. (2005)

M.Vainio, “Artificial Neural networks based prosody models for Finnish text-to-speech synthesis”, PhD. Thesis, Helsinki University of Technology, Finland, 2001

P. Boersma and D. Weenink, “Praat: Doing phonetics by computer, version 4.4”, available at http://www.praat.org

P. Taylor, “Analysis and synthesis of intonation using the Tilt model”, Journal of Accoustic society of America, No 107, pp 1697-1714, 2000

S. Narusawa, N. Minematsu, K. Hirose and H. Fujisaki, “Automatic extraction of model parameters from fundamental frequency contours of English utterances”, in Proceedings of ICSP’2000, pp 1725-1728, Denver, Colorado, USA

S. Sakai and J. Glass, “Fundamental frequency modeling for corpus-based speech synthesis based on a statistical learning technique”, in Proceedings of IEEE ASRU 2003, Nov. 30-Dec.4, 2003, St. Thomas, US Virgin Islands, pp 712-717

X. Sun, “F0 Generation for speech synthesis using a multi-tier approach”, in Proceedings of ICSLP’02, Denver, 2002, pp 2077-2080

X. Sun, “SHR program” , available at http://mel.speech.nwu.edu/sunxj/pda.htm, Copyrightht © 2001, X.Sun, Department of communication sciences and disorders, Northwestern University,USA

Z. Mnasri, F. Boukadida and N. Ellouze, “Modelling segmen1tal durations by statistical learning for an Arabic TTS system”, International Revue on Computer and Software,September 2009

MANUSCRIPT AUTHORS

Mr. Zied Mnasri

- Tunisia

zied.mnasri@gmail.com

Mr. Fatouma Boukadida

- Tunisia

Mr. Noureddine Ellouze

- Tunisia

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS