Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

(110.45KB)
This is an Open Access publication published under CSC-OpenAccess Policy.
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-Scale Analysis
Mohamed Anouar Ben Messaoud, Aicha Bouzid
Pages - 144 - 152     |    Revised - 30-10-2009     |    Published - 30-11-2009
Volume - 3   Issue - 5    |    Publication Date - November 2009  Table of Contents
MORE INFORMATION
KEYWORDS
Speech, Wavelet transforms, Multi-scale, Pitch, Voicing detection
ABSTRACT
This paper proposes a new voicing detection and pitch estimation method that is particularly robust for noisy speech. This method is based on the spectral analysis of the speech multi-scale product. The multi-scale product (MP) consists of making the product of wavelet transform coefficients. The wavelet used is the quadratic spline function. We argue that the spectral of Multi-scale Product Analysis is capable of revealing an estimate of a pitch-harmonic more accurately even in a heavy noisy scenario. We evaluate our approach on the Keele database. The experimental results show the robustness of our method for noisy speech, and the good performance for clean speech in comparison with state-of-the-art algorithms.
CITED BY (7)  
1 Bahja, F., Martino, J., Elhaj, E. I., & Aboutajdine, D. (2016). A corroborative study on improving pitch determination by time–frequency cepstrum decomposition using wavelets. SpringerPlus, 5(1), 1-17.
2 Acharya, P.Speech enhancement using unbiased normalized adaptive filtering technique.
3 Prameela, K., Kumar, M. A., Zia-Ur-Rahman, M., & Rao, B. R. M. (2011). Non Stationary Noise Removal from Speech Signals using Variable Step Size Strategy. International Journal of Computer Science & Communication Networks, 1(1).
4 Rahman, M. Z. U., Mohedden, S. K., Rao, B. R. M., Reddy, Y. J., & Karthik, G. V. S. (2011). Filtering Non-Stationary Noise in Speech Signals using Computationally Efficient Unbiased and Normalized Algorithm. International Journal on Computer Science and Engineering, ISSN, 0975-3397.
5 Karthik, G. V. S., Kumar, M. A., & Rahman, M. Z. U. (2011). Speech Enhancement Using Gradient Based Variable Step Size Adaptive Filtering Techniques. International Journal of Computer Science & Emerging Technologies (E-ISSN: 2044-6004), 2(1), 168-177.
6 Mohedden, S. K., Zia-Ur-Rahman, M., Krishna, K. M., & Rao, B. R. M. Battle Field Speech Enhancement using an Efficient Unbiased Adaptive Filtering Technique.
7 Messaoud, M. A. B., Bouzid, A., & Ellouze, N. (2010). Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation. Cognitive Computation, 2(3), 151-159.
1 Google Scholar
2 Academic Index
3 refSeek
4 iSEEK
5 Socol@r
6 ResearchGATE
7 Bielefeld Academic Search Engine (BASE)
8 Scribd
9 SlideShare
10 PDFCAST
11 PdfSR
12 Free-Books-Online
1 J.P. Campbell. “Speaker Recognition : A Tutorial”. In Proceedings of the IEEE, 85(9): 1437--1462, 1997
2 A. Martin, D. Charlet and L. Mauuary. “Robust Speech/ Non-speech Detection Using LDA Applied to MFCC”. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1: 237--240, 2001
3 D. O. Shaughnessy. “Speech communications: human and machine”. IEEE Press, NY, second edition, (2000)
4 D.G. Childers, M. Hahn and J.N. Larar. “Silence and Voiced/Unvoiced/Mixed Excitation Classification of Speech”. IEEE Trans. On Acoust., Speech , Signal Process, 37(11):1771--1774, 1989
5 L. Liao and M. Gregory. “Algorithms for Speech Classification”. In Proceedings of the 5th ISSPA, Brisbane, 1999
6 W. J. Hess. “Pitch and voicing determination”, Marcel Dekker, Inc., pp. 3-48 (1992)
7 P. C. Bagshaw, S. M. Hiller and M. A. Jack. “Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching”. In Proceedings of the 3rd European Conference on Speech Communication and Technology, 1993
8 D. Talkin. “A robust algorithm for pitch tracking (RAPT)”. In Speech Coding and Synthesis, W. B. Kleijn and K. K. Paliwal, Eds.,Elsevier Science, pp. 497-518 (1995)
9 L. Rabiner. “On the use of autocorrelation analysis for pitch detection”. IEEE Trans. Acoust., Speech, Signal Processing, 25(1): 24-33, 1977
10 D. A. Krubsack and R. J. Niederjohn. “An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech”. IEEE Trans. Acoust., Speech, Signal Processing, 39(1): 319-329, 1991
11 A. Cheveigné. “YIN, a fundamental frequency estimator for speech and music”. Journal of the Acoustical Society of America, 111(4):1917-1930, 2002
12 P. Boersma. “Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound”. In Proceedings of the Institute of Phonetic Sciences, Amsterdam, 1993
13 A. M. Noll. “Cepstrum pitch determination”. J. Acoust. SOC. Amer., 41: 293-309, 1967
14 T. Shimamura and H. Takagi. “Noise-Robust Fundamental Frequency Extraction Method Based on Exponentiated Band-Limited Amplitude Spectrum”. In The 47th IEEE International Midwest Symposium on Circuits and Systems, 2004
15 A. Bouzid and N. Ellouze. “Electroglottographic measures based on GCI and GOI detection using multiscale product”, International journal of computers, communications and control, 3(1): 21-32, 2008
16 A. Bouzid and N. Ellouze. “Open Quotient Measurements Based on Multiscale Product of Speech Signal Wavelet Transform”, Research Letter in Signal Processing, 7: 1687-6911, 2008
17 C. S. Burrus, R. A. Gopinath and H. Guo. “Introduction to Wavelets and Wavelet Transform”, A Primer. Prentice Hall, (1998)
18 S. Mallat. “A Wavelet Tour of Signal Processing”, Academic Press, second edition, (1999)
19 Z. Berman and J. S. Baras. “Properties of the multiscale maxima and zero-crossings representations”, IEEE Trans.on Signal Processing, 42(1):3216-3231, 1993
20 S. Kadambe and G. Faye Boudreaux-Bartels. “Application of the Wavelet Transform for Pitch Detection of Speech Signals”. IEEE Trans. on Info. Theory, 38: 917-924, 1992
21 B. M. Sadler and A. Swami. “Analysis of multi-scale products for step detection and estimation”. IEEE Trans. Inform. Theory, 1043-1051, 1999
22 . B. M. Sadler, T. Pham and L. C. Sadler. “Optimal and wavelet-based shock wave detection and estimation”. Journal of the Acoustical Society of America, 104: 955-963, 1998
23 G. Meyer, F. Plante and W. A. Ainsworth. “A pitch extraction reference database”. EUROSPEECH,1995
24 F. Sha and L. K. Saul. “Real-time pitch determination of one or more voices by nonnegative matrix factorization”, L. K. Saul, Y. Weiss, and L. Bottou, Eds., MIT Press, pp. 1233-1240 (2005)
25 F. Sha, J. A. Burgoyne and L. K. Saul. “Multiband statistical learning for F0 estimation in speech”. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Canada, 2004
26 K. Achan, S. Roweis, A. Hertzmann and B. Frey. “A segment-based probabilistic generative model of speech”. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2005
27 L. R. Rabiner, M. J. Cheng, A. H. Rosenberg and C. A. McGonegal. “A comparative performance study of several pitch detection algorithms”. IEEE Trans. Acoust., Speech, Signal Processing, 24(5): 399-417, 1976
Mr. Mohamed Anouar Ben Messaoud
National School of Engineers of Tunis - Tunisia
anouar.benmessaoud@yahoo.fr
Associate Professor Aicha Bouzid
National School of Engineers of Tunis - Tunisia