Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

(547.37KB)
This is an Open Access publication published under CSC-OpenAccess Policy.

PUBLICATIONS BY COUNTRIES

Top researchers from over 74 countries worldwide have trusted us because of quality publications.

United States of America
United Kingdom
Canada
Australia
Malaysia
China
Japan
Saudi Arabia
Egypt
India
Lip Reading by Using 3-D Discrete Wavelet Transform with Dmey Wavelet
Sunil S. Morade, Suprava Patnaik
Pages - 384 - 396     |    Revised - 10-09-2014     |    Published - 10-10-2014
Volume - 8   Issue - 5    |    Publication Date - September / October 2014  Table of Contents
MORE INFORMATION
KEYWORDS
2-D DWT, 3-D DWT, Dmey Wavelet, BPNN, SVM, Lip Reading.
ABSTRACT
Lip movement is an useful way to communicate with machines and it is extremely helpful in noisy environments. However, the recognition of lip motion is a difficult task since the region of interest (ROI) is nonlinear and noisy. In the proposed lip reading method we have used two stage feature extraction mechanism which is précised, discriminative and computation efficient. The first stage is to convert video frame data into 3 dimension space and the second stage trims down the raw information space by using 3 Dimension Discrete Wavelet Transform (DWT). These features are smaller in size to give rise a novel lip reading system. In addition to the novel feature extraction technique, we have also compared the performance of Back Propagation Neural Network (BPNN) and Support Vector Machine(SVM) classifier. CUAVE database and Tulips database are used for experimentation. Experimental results show that 3-D DWT feature mining is better than 2-D DWT. 3-D DWT with Dmey wavelet results are better than 3-D DWT Db4. Results of experimentation show that 3-D DWT-Dmey along with BNNN classifier outperforms SVM.
CITED BY (1)  
1 Morade, S. S., & Patnaik, S. (2015, January). A Genetic Algorithm-based 3D feature selection for lip reading. In Pervasive Computing (ICPC), 2015 International Conference on (pp. 1-6). IEEE.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
1 E. D. Petajan, “Automatic lip-reading to enhance speech recognition”, Ph.D. Thesis University of Illinois, 1984.
2 M. C.Weeks “Architectures For The 3-D Discrete Wavelet Transform” Ph.D. Thesis University of Southwestern Louisiana, 1998.
3 Bergler and Y. Konig, ““Eigenlips” For robust speech recognition,” in Proc. IEEE Int. Conf. on Acustics , Speech and signal processing, 1994.
4 Potamianos, H. Graf, and E. Cosatto, “An image transform approach for HMM based automatic lip reading,” Int. Conf. on Image Processing, 173–177, 1998.
5 R. Seymour, D. Stewart, and Ji Ming, “Comparison of image transform-based features for visual speech recognition in clean and corrupted videos,” EURASIP Journal on Video Processing, Vol. 2008, 1-9, 2008.
6 X. Wang, Y. Hao, D. Fu, and C. Yuan “ROI processing for visual features extraction in lip- reading,” IEEE Int. Conf. Neural Networks & Signal Processing, 178-181, 2008.
7 N. Puviarasan, S. Palanivel, “Lip reading of hearing impaired persons using HMM,” Elsevier Journal on Expert Systems with Applications, 1-5, 2010.
8 A. Shaikh and J. Gubbi, “Lip reading using optical flow and support vector machines”, CISP 2010, 327-310, 2010.
9 G. F. Meyor, J. B. Mulligan and S. M. Wuerger, “Continuous audio-visual using N test decision Fusion”, Elsevier Journal on Information Fusion, 91-100, 2004.
10 L. Rothkrantz, J. Wojdel, and P. Wiggers, “Comparison between different feature extraction techniques in lipreading applications,” SPECOM- 2006, 25-29, 2006.
11 P. Viola, M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple features”, IEEE Int. Conf., 511-517, 2001.
12 H. Lee, Y. Kim, A. Rowberg, and E. Riskin, "Statistical Distributions of DCT Coefficients and their Application to an Inter frame Compression Algorithm for 3-D Medical Images," IEEE Transactions of Medical Imaging, Vol. 12, 478-485, 1993.
13 J. Wang and H. Huang, "Three-dimensional Medical Image Compression using a Wavelet Transform with Parallel Computing," SPIE Imaging Physics Vol. 2431, 16-26,1995,
14 V. Long and L. Gang “Selection of the best wavelet base for speech signal” IEEE. Intelligent multimedia, video and speech processing, 2004.
15 A. K. Jain, R. P. Duin, and J. Mao, “Statistical Pattern Recognition: A Review” IEEE Transactions On Pattern Analysis And Machine Intelligence, 22, 1, 2000.
16 C. Bregler and Y. Konig, “Eigenlips” For Robust Speech Recognition”, IEEE conf. Acoustics, Speech, and Signal Processing, 1-4, 1994.
17 V.N. Vapnik, “stastical learning theory” New York John Wiley & Suns, 1998.
18 V. Kechman, “Learning and soft computing, support vector machines, Neural Networks and Fuzzy logic models”, MIT Press Cambridge, 1-58, 2001.
19 J. C. Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines”, Microsoft research reports, 1-21, 1998.
20 E. Osuna, R.Freund and F.Girosi, An Improved Training Algorithm for Support Vector Machines, Neural networks for signal processing”, Proc. of IEEE 1997, 276-285, 1997
21 E. Patterson, S. Gurbuz, Z. Tufekci, and J. Gowdy, “CUAVE: a new audio-visual database for multimodal human computer- interface research”, Proceedings of IEEE Int. conf. on Acoustics, speech and Signal Processing, 2017-2020, 2002.
22 J. R. Movellan “Visual Speech Recognition with Stochastic Networks”, Advances in Neural Information Processing Systems, MIT Pess, Cambridge, 1995.
Mr. Sunil S. Morade
SVNIT, Surat, India - India
ssm.eltx@gmail.com
Professor Suprava Patnaik
Ex-Professor, SVNIT,Surat - India