List of Journals    /    Call For Papers    /    Subscriptions    /    Login
 
 
 
 
 SEARCH
By Author By Title
 
 
ABOUT CSC
 About CSC Journals
 CSC Journals Objectives
 List of Journals
 CALL FOR PAPERS
 Call For Papers CFP
 Special Issue CFP
AUTHOR GUIDELINES
 Submission Guidelines
 Peer Review Process
 Helpful Hints For Getting Published
 Plagiarism Policies
 Abstracting & Indexing
 Open Access Policy
 Submit Manuscript
 FOR REVIEWERS
 Reviewer Guidelines
 FOR EDITORIAL
 Editor Guidelines
 Join Us As Editor
 Launch Special Issue
 Suggest New Journal
 CSC LIBRARY
 Browse CSC Library
 Open Access Policy
  SERVICES
 Conference Partnership Program (CPP)
 Abstracting & Indexing
 SUBSCRIPTIONS
 Subscriptions
 Discounted Packages
 Archival Subscriptions
 How to Subscribe
 Librarians
 Subscriptions Agents
 Order Form
 DOWNLOADS
 
 
 
 
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Using Spectro-Temporal Domain
Full text
 PDF(337.8KB)
Source 
Signal Processing: An International Journal (SPIJ)
Table of Contents
Download Complete Issue    PDF(0 Bytes)
Volume:  4    Issue:  4
Pages:  175-246
Publication Date:   October 2010
ISSN (Online): 1985-2339
Pages 
228 - 238
Author(s)  
Sara Valipour - Iran
Farbod Razzazi - Iran
Azim Fard - Iran
 
Published Date   
30-10-2010 
Publisher 
CSC Journals, Kuala Lumpur, Malaysia
ADDITIONAL INFORMATION
Keywords   Abstract   References   Cited by   Related Articles   Collaborative Colleague
 
KEYWORDS:   Voice activity detector, Spectro-temporal Domain, Gaussian modeling, Auditory model 
 
 
This Manuscript is indexed in the following databases/websites:-
1. Docstoc
2. Scribd
3. Directory of Open Access Journals (DOAJ)
4. Google Scholar
5. Socol@r
 
 
In this paper, a voice activity detector is proposed on the basis of Gaussian modeling of noise in the spectro-temporal space. Spectro-temporal space is obtained from auditory cortical processing. The auditory model that offers a multi-dimensional picture of the sound includes two stages: the initial stage is a model of inner ear and the second stage is the auditory central cortical modeling in the brain. In this paper, the speech noise in this picture has been modeled by a 3-D mono Gaussian cluster. At the start of suggested VAD process, the noise is modeled by a Gaussian shaped cluster. The average noise behavior is obtained in different spectrotemporal space in various points for each frame. In the stage of separation of speech from noise, the criterion is the difference between the average noise behavior and the speech signal amplitude in spectrotemporal domain. This was measured for each frame and was used as the criterion of classification. Using Noisex92, this method is tested in different noise models such as White, exhibition, Street, Office and Train noises. The results are compared to both auditory model and multifeature method. It is observed that the performance of this method in low signal-to-noise ratios (SNRs) conditions is better than other current methods. 
 
 
 
1 N. Mesgarani, S. A Shamma, “Speech enhancement based on filtering the spectrotemporal modulations”, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Philadelphia, March 2005.
2 N. R. Garner, P. A. Barrett, D. M. Howard, and A. M. Tyrrell, “Robust noise detection for speech detection and enhancement”, Electron. Lett., Vol. 33, no. 4, pp. 270-271, Feb. 1997.
3 J.Sohn, N. S. Kim, and W.Sung, “A statistical model-based voice activity detection”, IEEE Signal Process. Lett., Vol. 6, no. 1, pp. 1-3, Jan 1999.
4 L.F. Lamel, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilpon, “An improved endpoint detector for isolated word recognition”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, pp. 777-758, 1981.
5 T. Kinnunen, E. Chernenko, M.Tuononen, P. Fränti, and H.Li, “Voice activity detection using MFCC features and support vector machine”, Int. Conf. on Speech and Computer (SPECOM07), Moscow, Russia, Vol. 2, 556-561, Oct 2007.
6 J.Sohn, W.Sung, “A voice activity detector employing soft decision based noise spectrum adaptation”, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 365-368, 1998.
7 Ángel de la Torre, Javier Ramírez, Carmen Benítez, Jose C.Segura, Luz García, Antonio J.Rubio, “Noise robust model-based voice activity detection”, INTERSPEECH2006, pp. 1954- 1957, Pittsburgh, 2006.
8 J. –H. Chang, N. S. Kim, and S. K. Mitra, “Voice activity detection based on multiple statistical models”, IEEE Trans. Signal Processing, Vol. 56, no. 6, pp. 1965-1976, June, 2006.
9 J.W.Shin, J. -H. Chang, H. S. Yun, and N. S. Kim, “Voice Activity detection based on generalized gamma distribution”, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Vol. 1, pp. 781-784, March 2005.
10 B. Meyer and M. Kleinschmidt, “Robust speech recognition based on localized spectrotemporal features”, in Proceedings of the Elektronische Sprach-und Signalverarbeitung (ESSV), Karlsruhe, 2003.
11 C.Shahnaz, W.-P.Zhu and M.O.Ahmad, “Aspectro-temporal algorithm for pitch frequency estimation from noisy observations”, in Proc. 2008 IEEE ISCAS, pp. 1704-1707, May 18-21, 2008, Seattle, USA.
12 T. Chi, P. Ru, and S. A. Shamma, “Multiresolution spectrotemporal analysis of complex sounds”, Journal of the Acoustical Society of America, Vol. 118, no. 2, pp. 887-906, 2005.
13 N. Kowalski, D. A. Depireux, and S. Shamma, “Analysis of dynamic spectra in ferret primary auditory cortex I. Characteristics of signal-unit response to moving ripple spectra”, J.Neurophsiology, Vol. 76, no. 5, pp.3503-3523, 1996.
14 K.Wang and S. A. Shamma, “Spectral shape analysis in the central system”, IEEE Trans. Speech Process. , Vol. 3, no. 5, pp. 382-395, Sep. 1995.
15 K. Wang and S. A. Shamma, ” Self-normalization and noise-robustness in early auditory representations”, IEEE Trans. Speech and Audio Proc, pp: 421–435, 1994.
16 S. A. Shamma, “Speech processing in the auditory system II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve”, J. Acoust. Soc. Am., pp:1622–1632, 1985
17 S. Shamma, “Methods of neuronal modeling”, in Spatial and Temporal Processing in the Auditory System, pp. 411-460, MIT press, Cambridge, Mass, USA, 2nd edition, 1998.
18 A. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones, “The NOISEX-92 study the effect of additive noise on automatic speech recognition ”, Documentation included in the NOISEX- 92 CD-ROMs, 1992.
19 N. Mesgarani, S. Shamma, and M. Slaney, “Speech discrimination based on multiscale spectro-temporal modulations”, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP ’04), Vol. 1, pp. 601-604, Montreal, Canada, May 2004.
20 E. Scheirer, and M. Slaney, “Construction and evaluation of a robust multifeature speech/music discriminator”, in Int. Conf. Acoustic, Speech and Signal Processing, Vol. 2, Munich, Germany, 1997, p. 1331.
 
 
 
 
 
 
 
 
Sara Valipour : Colleagues
Farbod Razzazi : Colleagues
Azim Fard : Colleagues  
 
 
 
  Untitled Document
 
Copyrights (c) 2012 Computer Science Journals. All rights reserved.
Best viewed at 1152 x 864 resolution. Microsoft Internet Explorer.
 
  
 
Copyrights & Usage: Articles published by CSC Journals are Open Access. Permission to copy and distribute any other content, images, animation and other parts of this website is prohibited. CSC Journals has the rights to take action against individual/group if they are found victim of copying these parts of the website.