Home   >   CSC-OpenAccess Library   >    Manuscript Information
Survey On Speech Synthesis
A.Indumathi, E.Chandra
Pages - 140 - 145     |    Revised - 15-11-2012     |    Published - 31-12-2012
Volume - 6   Issue - 5    |    Publication Date - December 2012  Table of Contents
TTS, HMM, Synthesis
The primary goal of this paper is to provide an overview of existing Text-To-Speech (TTS) Techniques by highlighting its usage and advantage. First Generation Techniques includes Formant Synthesis and Articulatory Synthesis. Formant Synthesis works by using individually controllable formant filters, which can be set to produce accurate estimations of the vocal-track transfer function. Articulatory Synthesis produces speech by direct modeling of Human articulator behavior. Second Generation Techniques incorporates Concatenative synthesis and Sinusoidal synthesis. Concatenative synthesis generates speech output by concatenating the segments of recorded speech. Generally, Concatenative synthesis generates the natural sounding synthesized speech. Sinusoidal Synthesis use a harmonic model and decompose each frame into a set of harmonics of an estimated fundamental frequency. The model parameters are the amplitudes and periods of the harmonics. With these, the value of the fundamental can be changed while keeping the same basic spectral..In adding, Third Generation includes Hidden Markov Model (HMM) and Unit Selection Synthesis.HMM trains the parameter module and produce high quality Speech. Finally, Unit Selection operates by selecting the best sequence of units from a large speech database which matches the specification.
CITED BY (5)  
1 Doush, I. A., Alkhatib, F., & Bsoul, A. A. R. (2016). What we have and what is needed, how to evaluate Arabic Speech Synthesizer?. International Journal of Speech Technology, 1-18.
2 Waghmare, K., Kayte, S., & Gawali, B. (2016). Analysis of Pitch and Duration in Speech Synthesis using PSOLA. Analysis, 4(4).
3 Kaladharan, N. English Text to Speech Conversion using Indian Pronunciation.
4 Rathinavelu, A., Sandhiya, C., & Saranya, K. (2015). A novel speech synthesizer using 3D facial model with gestures. International Journal on Disability and Human Development, 14(2), 141-146.
5 Kayte, S., Waghmare, K., & Gawali, B. Marathi Speech Synthesis: A review.
1 Google Scholar 
2 CiteSeerX 
3 Scribd 
4 SlideShare 
5 PdfSR 
. A.Black and Po Taylor,”The Festival Speech Synthesis System:system documentation,Technical Report” HCHC/TR – 8s, 1997.
. A.Hunt and A.Black, “ Unit selection in Concatenative Speech Synthesis system using large speech database,” Proc IEEE Int, Conf-Accoust., Speech,Signal Processing,pp 373-376,1996.
. B.Kroger, “Minimal Rules for Articulatory Speech Synthesis”, Proceedings of EUSIPCO92,pp,331-334,1992.
. D.H.Klatt “Review of Text – to- speech Conversion for English”, Journal of the Accoustical Society of America, Vol. 82(3),1987.
. Garcia,G.J. Pampin,1999,” Data Compression of Sinusoidal modeling parameters based on psychoaccousitc marking”, Proc.ICMC. International computer Music Conference,Beijing,China.
. Hon,H.Acero,A.Huang,X.,Liu,J.,and plumpe,M.”Automatic Generation of Synthesis units for Trainable Text-to-speech Systems” . In proceedings of the IEEE international conference on Acousitics, Speech and Singnal Processing 1998.
. J.Allen,S.Hunncutt and D.H. Klatt,”From Text – to – Speech:The MItalk Systems”,Cambridge university Press, Cambridge, 1987.
. K.Tucodo et al., “Hidden Semi-Marrov model based speech synthesis”, Inter Speech PP.1185-1180,2004.
. L.R.Rainer,”Applications of Voice processing to Telecommunications”,Proc.IEE,Vol.82,PP.199-228,1994.
. Moulines,E., and Charpertier, F.”Pitch Synchronous waveform processing techniques for Text- to –Speech Synthesis using diphones.” Speech Communications 9,pp 453-67.1990.
. Othman.O.Khalifa,et al “SMA Talk: Standard malay Text to Speech Talk System”-Signal Processing :An International Journal (SPIJ),Vol:2,Issue:5,pp:1-26,2008.
. T.Styger ,E Keller, “Format Synthesis”, Fundamental of Speech Synthesis and Speech Recognition; Basi concept,State of the Art and future challenges ( PP.109-128).
. Wouter. J., and Macon, M. W. “Unit fusion for concatenative speech synthesis”. In proceedings of the International Conference on Spoken Language Proceedings 2000(2000).
. Y.Stylianao, “Modeling Speech Based on the Harmoni Plus Noise Models,”Springes 2005.
. Yamagishi, J., Onishi,K., Masuko, T., and Kobayashi, T. “Modeling of Various Speaking styles and emotions for HMM-Based speech synthesis”. In Proceedings of Eurospeech 2003(2003).
. Zen.H. Tokuda, K.Masuko, T., Kobayashi, T., and Kitamura, T.Hidden “SemiMakov model based synthesis.” In the Proceedings of 8th International Conference on Spoken Language Processing, Interspeech 2004(2004).
Dr. A.Indumathi
- India
Dr. E.Chandra
- India