The use of visual information from lip movements can improve the accuracy and robustness of a speech recognition system. Accurate extraction of visual features associated with the...
Alan Wee-Chung Liew, Shu Hung Leung, Wing Hong Lau
In current speech recognition systems mainly Short-Time Fourier Transform based features like MFCC are applied. Dropping the short-time stationarity assumption of the voiced speec...
This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic a...
The method which is called the “tandem approach” in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a...
This paper proposes a set of affine invariant features (AIFs) for sequence data. The proposed AIFs can be calculated directly from the sequence data, and their invariance to af...