WAPUSK20 - A Database for Robust Audiovisual Speech Recognition

14 years 2 months ago

Download www.lrec-conf.org

Audiovisual speech recognition (AVSR) systems have been proven superior over audio-only speech recognizers in noisy environments by incorporating features of the visual modality. In order to develop reliable AVSR systems, appropriate simultaneously recorded speech and video data is needed. In this paper, we will introduce a corpus (WAPUSK20) that consists of audiovisual data of 20 speakers uttering 100 sentences each with four channels of audio and a stereoscopic video. The latter is intended to support more accurate lip tracking and the development of stereo data based normalization techniques for greater robustness of the recognition results. The sentence design has been adopted from the GRID corpus that has been widely used for AVSR experiments. Recordings have been made under acoustically realistic conditions in a usual office room. Affordable hardware equipment has been used, such as a pre-calibrated stereo camera and standard PC components. The software written to create this co...

Alexander Vorwerk, Xiaohui Wang, Dorothea Kolossa,

Real-time Traffic

Audio-only Speech Recognizers | Audiovisual Speech Recognition | Education | LREC 2010 | Pre-calibrated Stereo Camera |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Alexander Vorwerk, Xiaohui Wang, Dorothea Kolossa, Steffen Zeiler, Reinhold Orglmeister

Comments (0)

Sciweavers

WAPUSK20 - A Database for Robust Audiovisual Speech Recognition

Audio-only Speech Recognizers | Audiovisual Speech Recognition | Education | LREC 2010 | Pre-calibrated Stereo Camera |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers