Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

(136.15KB)
This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 74 countries worldwide.
Parameters Optimization for Improving ASR Performance in Adverse Real World Noisy Environmental Conditions
Urmila Shrawankar, Vilas Thakare
Pages - 58 - 70     |    Revised - 15-09-2012     |    Published - 25-10-2012
Volume - 3   Issue - 3    |    Publication Date - October 2012  Table of Contents
MORE INFORMATION
KEYWORDS
ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction
ABSTRACT
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
CITED BY (4)  
1 Smruti, S., Sahoo, J., Dash, M., & Mohanty, M. N. (2015, January). An Approach to Design an Intelligent Parametric Synthesizer for Emotional Speech. In Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014 (pp. 367-374). Springer International Publishing.
2 Lanzola, G., Parimbelli, E., Micieli, G., Cavallini, A., & Quaglini, S. (2014). Data quality and completeness in a web stroke registry as the basis for data and process mining. Journal of healthcare engineering, 5(2), 163-184.
3 Shrawankar, U., & Thakare, V. (2013). An Adaptive Methodology for Ubiquitous ASR System. arXiv preprint arXiv:1303.3948.
4 Sunitha, K. V., & Sharada, A. (2012). Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor. International Journal of Human Computer Interaction (IJHCI), 3(4), 83.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
1 “LOOKING AHEAD: Grand Challenges In Speech And Language Processing”, IEEE Signal Processing Magazine [179] January 2012
2 B.-H.Juang, "Speech Recognition in Adverse Environments," Computer Speech and Language, pp. 275--294, 5, 1991.
3 Y.Gong, "Speech Recognition in Noisy Environments: A Survey," Speech Communication,Vol. 12, No. 3, pp. 231--239, June, 1995.
4 L. A. Zadeh, “Fuzzy sets,” Inform. Control, vol. 8, pp. 338–353, 1965
5 J. C. Bezdek and S. K. Pal, Eds., “Fuzzy Models for Pattern Recognition Methods That Search for Structures in Data”. New York: IEEE Press, 1992.
6 T. Takagi and M. Sugeno, “Fuzzy identification of systems and its applications to modeling and control,” IEEE Trans. Syst., Man, Cybern., vol. SMC-15, no. 1, pp. 116–132, Jan. 1985.
7 Qifeng Zhu and Abeer Alwan, “On The Use Of Variable Frame Rate Analysis In Speech Recognition”, ICASSP, 2000
8 J. Ramírez, J. M. Górriz and J. C. Segura, “Voice Activity Detection. Fundamentals and Speech Recognition System Robustness”, I-Tech, Vienna, Austria, June 2007
9 Loizou, P., Kim, G., “Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions”. IEEE Trans. Acoust. Speech Signal Process.19, 47–56., 2011
10 Loizou, P., “Speech Enhancement: Theory and Practice”. CRC Press LLC, Boca Raton,Florida.,2007
11 Suhadi Suhadi, Carsten Last, and Tim Fingscheidt, “A Data-Driven Approach to A Priori SNR Estimation”, IEEE Transactions On Audio, Speech, And Language Processing, Vol. 19, No.1, January 2011, pg 186- 195
12 Motlícek P.: “Feature Extraction in Speech Coding and Recognition”, Report, Portland, US,Oregon Graduate Institute of Science and Technology, pp. 1-50, 2002
13 I Mporas, T Ganchev, M Siafarikas, N Fakotakis, “Comparison of Speech Features on the Speech Recognition Task”, Journal of Computer Science Vol 3 (8): pp 608-616, 2007
14 L R Rabiner, “A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition”, proceedings of the IEEE, Vol. 77, No. 2, Feb 1989.
15 MathWorks - MATLAB and Simulink for Technical Computing, www.mathworks.com/.
Miss Urmila Shrawankar
G H Raisoni College of Engg., Nagpur - India
urmilas@rediffmail.com
Dr. Vilas Thakare
SGB Amravati University, Amravati - India