List of Journals    /    Call For Papers    /    Subscriptions    /    Login
 
 
 
 
 SEARCH
By Author By Title
 
 
ABOUT CSC
 About CSC Journals
 CSC Journals Objectives
 List of Journals
 CALL FOR PAPERS
 Call For Papers CFP
 Special Issue CFP
AUTHOR GUIDELINES
 Submission Guidelines
 Peer Review Process
 Helpful Hints For Getting Published
 Plagiarism Policies
 Abstracting & Indexing
 Open Access Policy
 Submit Manuscript
 FOR REVIEWERS
 Reviewer Guidelines
 FOR EDITORIAL
 Editor Guidelines
 Join Us As Editor
 Launch Special Issue
 Suggest New Journal
 CSC LIBRARY
 Browse CSC Library
 Open Access Policy
  SERVICES
 Conference Partnership Program (CPP)
 Abstracting & Indexing
 SUBSCRIPTIONS
 Subscriptions
 Discounted Packages
 Archival Subscriptions
 How to Subscribe
 Librarians
 Subscriptions Agents
 Order Form
 DOWNLOADS
 
 
 
 
A Novel Algorithm for Acoustic and Visual Classifiers Decision Fusion in Audio-Visual Speech Recognition System
Full text
 PDF(354.5KB)
Source 
Signal Processing: An International Journal (SPIJ)
Table of Contents
Download Complete Issue    PDF(1.86MB)
Volume:  4    Issue:  1
Pages:  1-67
Publication Date:   March 2010
ISSN (Online): 1985-2339
Pages 
23 - 37
Author(s)  
Rajavel - India
P.S. Sathidevi - India
 
Published Date   
26-03-2010 
Publisher 
CSC Journals, Kuala Lumpur, Malaysia
ADDITIONAL INFORMATION
Keywords   Abstract   References   Cited by   Related Articles   Collaborative Colleague
 
KEYWORDS:   Audio-visual speech recognition, Reliability-ratio based weight optimization, late integration 
 
 
This Manuscript is indexed in the following databases/websites:-
1. Directory of Open Access Journals (DOAJ)
2. Docstoc
3. PDFCAST
4. Scribd
5. WorldCat
6. ScientificCommons
7. Google Scholar
8. refSeek
9. ResearchGATE
10. Bielefeld Academic Search Engine (BASE)
11. Academic Index
12. iSEEK
13. Socol@r
 
 
Audio-visual speech recognition (AVSR) using acoustic and visual signals of speech have received attention recently because of its robustness in noisy environments. Perceptual studies also support this approach by emphasizing the importance of visual information for speech recognition in humans. An important issue in decision fusion based AVSR system is how to obtain the appropriate integration weight for the speech modalities to integrate and ensure the combined AVSR system’s performances better than that of the audio-only and visual-only systems under various noise conditions. To solve this issue, we present a genetic algorithm (GA) based optimization scheme to obtain the appropriate integration weight from the relative reliability of each modality. The performance of the proposed GA optimized reliability-ratio based weight estimation scheme is demonstrated via single speaker, mobile functions isolated word recognition experiments. The results show that the proposed scheme improves robust recognition accuracy over the conventional unimodal systems and the baseline reliability ratio-based AVSR system under various signal to noise ratio conditions. 
 
 
 
1 K. Iwano, T. Yoshinaga, S. Tamura, S. Furui. “Audio-visual speech recognition using lip information extracted from side-face images”. EURASIP Journal on Audio, Speech, and Music Processing, (2007): 9 pages, Article ID 64506, 2007
2 J.S. Lee, C. H. Park. “Adaptive Decision Fusion for Audio-Visual Speech Recognition”’. In: F. Mihelic, J. Zibert (Eds.), Speech Recognition, Technologies and Applications, pp. 550 (2008)
3 J.S. Lee, C. H. Park. “Robust audio-visual speech recognition based on late integration”’. IEEE Transaction on Multimedia, 10: 767-779, 2008
4 G. F. Meyer, J. B.Mulligan, S. M.Wuerger. “Continuous audiovisual digit recognition using N-best decision fusion”. Information Fusion. 5: 91-101, 2004
5 A. Rogozan, P. Delglise. “Adaptive fusion of acoustic and visual sources for automatic speech recognition”. Speech Communication. 26: 149-161, 1998
6 G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior. “Recent advances in the automatic recognition of audio-visual speech”. In Proceedings of IEEE, 91(9), 2003
7 S. Dupont, J. Luettin. “Audio-visual speech modeling for continuous speech recognition”. IEEE Transanction on Multimedia, 2: 141-151, 2000
8 G. Potamianos, H. P. Graf, and E. Cosatto. “An image transform approach for HMM based automatic lipreading”. In Proceedings of International Conference on Image Processing. Chicago, 1998
9 R. Rajavel, P. S. Sathidevi. “Static and dynamic features for improved HMM based visual speech recognition”. In Proceedings of 1st International Conference on Intelligent Human Computer Interaction, Allahabad, India, 2009
10 G. Potamianos, A. Verma, C. Neti, G. Iyengar, and S. Basu. “A cascade image transform for speaker independent automatic speechreading”. In Proceedings of IEEE International Conference on Multimedia and Expo. New York, 2000
11 W. C. Yau, D. K. Kumar, S. P. Arjunan. “Voiceless speech recognition using dynamic visual speech features”. In Proceedings of HCSNet Workshop on the Use of Vision in HCI. Canberra, Australia, 2006
12 W. C. Yau, D. K. Kumar, H. Weghorn. “Visual speech recognition using motion features and Hidden Markov models”. In: M. Kampel, A. Hanbury (Eds.), LNCS, Springer, Heidelberg, pp. 832-839 (2007)
13 G. Potamianos, C. Neti, J. Luettin, and I. Matthews. “Audio-visual automatic speech recognition: An overview”. In: G. Baily, E. Vatikiotis-Bateson, P. Perrier (Eds.), Issues in visual and audio-visual speech processing, MIT Press, (2004)
14 R. Seymour, D. Stewart, J. Ming. “Comparison of image transformbased features for visual speech recognition in clean and corrupted videos”. EURASIP Journal on Image and Video Processing. (2008), doi:10.1155/2008/810362, 2008
15 B. Plannerer. “An introduction to speech recognition: A tutorial ”. Germany, 2003
16 L. Rabiner, B.H. Juang. “Fundamentals of Speech Recognition”’. Prentice Hall, Englewood Cliffs (1993)
17 B. Nasersharif, A. Akbari. “SNR-dependent compression of enhanced Mel sub-band energies for compensation of noise effects on MFCC features”. Pattern Recognition Letters, 28:1320-1326, 2007
18 T. Chen. “Audiovisual speech processing. Lip reading and lip synchronization”. IEEE Signal Processing Magazine, 18: 9-21, 2001
19 E. D. Petajan. “Automatic lipreading to enhance speech recognition”. In Proceedings of Global Telecommunications Conference. Atlanta, 1984
20 P. Arnold, F. Hill. “Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact”. Brit. J. Psychol., 92: 339-355, 2001
21 A. Q. Summerfield. “Some preliminaries to a comprehensive account of audio-visual speech perception”. In: B. Dodd, R. Campbell (Eds.), Hearing by Eye: The Psychology of Lip-reading. Lawrence Erlbarum, London, pp. 3-51 (1987)
22 C. Benoit, T. Mohamadi, S. D. Kandel. “Effects of phonetic context on audio-visual intelligibility of French”. Journal of Speech and Hearing Research. 37: 1195-1203, 1994
23 C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, D. Vergyri, J. Sison, A. Mashari, and J. Zhou. “Audio visual speech recognition, Final Workshop 2000 Report”. Center for Language and Speech Processing, Johns Hopkins University, Baltimore, 2000
24 P. Teissier, J. Robert-Ribes, J. L. Schwartz. “Comparing models for audiovisual fusion in a noisyvowel recognition task”. IEEE Transaction on Speech Audio Processing, 7: 629-642, 1999
25 C. C. Chibelushi, F. Deravi, J. S. D. Mason. “A review of speech-based bimodal recognition”. IEEE Transactions on Multimedia, 4(1): 23-37, 2002
26 P.L. Silsbee. “Sensory integration in audiovisual automatic speech recognition”. In Proceedings of the 28th Annual Asilomar Conference on Signals, Systems, and Computers, 1: 561-565, 1994
27 C. Benot. “The intrinsic bimodality of speech communication and the synthesis of talking faces”. In: M. M. Taylor, F. Nel, D. Bouwhuis (Eds.), The Structure of Multimodal Dialogue II. Amsterdam, Netherlands, pp. 485-502 (2000)
28 G. Potamianos, C. Neti, J. Huang, J.H. Connell, S. Chu, V. Libal, E.Marcheret, N. Hass, J. Jiang. “Towards practical development of audiovisual speech recognition”. In Proceedings of IEEE International Conf. on Acoustic, Speech, and Signal Processing. Canada, 2004
29 S.W.Foo, L. Dong. “Recognition of Visual Speech Elements Using Hidden Markov Models”. In: Y. C. Chen, L.W. Chang, C.T. Hsu (Eds.), Advances in Multimedia Information Processing-PCM02, LNCS2532. Springer-Verlag Berlin Heidelberg, pp.607-614 (2002)
30 A. Verma, T. Faruquie, C. Neti, S. Basu. “Late integration in audiovisual continuous speech recognition”. In Proceedings of Workshop on Automatic Speech Recognition and Understanding. Keystone, 1999
31 S. Tamura, K. Iwano, S. Furui. “A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization”. In Proceedings of ICASSP. Philadelphia, 2005
32 A. Adjoudani, C. Benot. “On the integration of auditory and visual parameters in an HMM-based ASR”. In: D. G. Stork and M. E. Hennecke (Eds.), Speech reading by Humans and Machines: Models, Systems, and Speech Recognition, Technologies and Applications, Springer, Berlin, Germany, pp. 461-472 (1996)
 
 
 
 
 
 
 
 
Rajavel : Colleagues
P.S. Sathidevi : Colleagues  
 
 
 
  Untitled Document
 
Copyrights (c) 2012 Computer Science Journals. All rights reserved.
Best viewed at 1152 x 864 resolution. Microsoft Internet Explorer.
 
  
 
Copyrights & Usage: Articles published by CSC Journals are Open Access. Permission to copy and distribute any other content, images, animation and other parts of this website is prohibited. CSC Journals has the rights to take action against individual/group if they are found victim of copying these parts of the website.