Visual Speech Recognition using Spatial-Temporal Gradient Analysis

Document Type : Original Article


1 Cyber space research inst., Shahid Beheshti University, Tehran, Iran

2 Cyber space research inst., Shahid Beheshti University, Tehran, Iran,


The use of visual information for voice recognition is an important solution in the absence of audio information. This paper presents a method for speech recognition using visual information by describing spatial-temporal changes in the lobe of the lips. The gradient of the image was used for feature extraction. In the proposed method, after lobe area detection and extraction of key points, the gradient was extracted to describe the spatial information of the key points. To describe the key areas of the lip during speaking, the 3D histogram of gradients path curve fitting was used. The main focus of this research was to provide an adequate description of speech. For this purpose, different classifiers were tested and the best one was recognized. To evaluate the proposed method, the MIRACL-VC1 database was used and the results were compared with previous methods for speech recognition which had an improvement about 11 to 17 percent.


