Sciweavers

ACSC
2004
IEEE

Sensor Fusion Weighting Measures in Audio-Visual Speech Recognition

14 years 3 months ago
Sensor Fusion Weighting Measures in Audio-Visual Speech Recognition
Audio-Visual Speech Recognition (AVSR) uses vision to enhance speech recognition but also introduces the problem of how to join (or fuse) these two signals together. Mainstream research achieves this using a weighted product of the output of the phoneme classifiers for both modalities. This paper analyses current weighting measures and compares them to several new measures proposed by the authors. Most importantly, when calculating the dispersion of the output there is a shift from analysing the variance to analysing the skewness of the distribution. Experiments in AVSR using neural networks raise questions of the utility of such measures with some intriguing results.
Trent W. Lewis, David M. W. Powers
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2004
Where ACSC
Authors Trent W. Lewis, David M. W. Powers
Comments (0)