Lightly supervised and unsupervised acoustic model training

15 years 6 months ago

Download tlp.limsi.fr

The last decade has witnessed substantial progress in speech recognition technology, with todays state-of-the-art systems being able to transcribe unrestricted broadcast news audio data with a word error of about 20%. However, acoustic model development for these recognizers relies on the availability of large amounts of manually transcribed training data. Obtaining such data is both time-consuming and expensive, requiring trained human annotators and substantial amounts of supervision. This paper describes some recent experiments using lightly supervised and unsupervised techniques for acoustic model training in order to reduce the system development cost. The approach uses a speech recognizer to transcribe unannotated broadcast news data from the Darpa TDT-2 corpus. The hypothesized transcription is optionally aligned with closed captions or transcripts to create labels for the training data. Experiments providing supervision only via the language model training materials show that ...

Lori Lamel, Jean-Luc Gauvain, Gilles Adda

Real-time Traffic