Improved models for Mandarin speech-to-text transcription

14 years 10 months ago

Download mirlab.org

This paper describes recent advances at LIMSI in Mandarin Chinese speech-to-text transcription. A number of novel approaches were introduced in the different system components. The acoustic models are trained on over 1600 hours of audio data from a range of sources, and include pitch and MLP features. N-gram and neural network language models are trained on very large corpora, over 3 billion words of texts; and LM adaptation was explored at different adaptation levels: per show, per snippet, or per speaker cluster. Character-based consensus decoding was found to outperform word-based consensus decoding for Mandarin. The improved system reduces the relative character error rate (CER) by about 10% on previous GALE development and evaluation data sets, obtaining a CER of 9.2% on the P4 broadcast news and broadcast conversation evaluation data.

Lori Lamel, Jean-Luc Gauvain, Viet-Bac Le, Ilya Op

Real-time Traffic

Broadcast Conversation Evaluation | Evaluation Data | ICASSP 2011 | Relative Character Error | Signal Processing |

claim paper

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Lori Lamel, Jean-Luc Gauvain, Viet-Bac Le, Ilya Oparin, Sha Meng

Sciweavers

Improved models for Mandarin speech-to-text transcription

Broadcast Conversation Evaluation | Evaluation Data | ICASSP 2011 | Relative Character Error | Signal Processing |

Explore & Download

Productivity Tools

Sciweavers