On the Use of Web Resources and Natural Language Processing Techniques to Improve Automatic Speech Recognition Systems

14 years 6 months ago

Download www.lrec-conf.org

Language models used in current automatic speech recognition systems are trained on general-purpose corpora and are therefore not relevant to transcribe spoken documents dealing with successive precise topics, such as long multimedia streams, frequently tackling reports and debates. To overcome this problem, this paper shows that Web resources and natural language processing techniques can be effective to automatically collect a topic specific corpora from the Internet in order to adapt the baseline language model of an automatic speech recognition system. We detail how to characterize the topic of a segment and how to collect Web pages from which a topicspecific language model can be trained. We finally present experiments where an adapted language model is obtained by combining the topic-specific language model with the general purpose one to obtain new transcriptions. The results show that our topic adaptation technique leads to significant transcription quality gains.

Gwénolé Lecorvé, Guillaume Gr

Real-time Traffic

Automatic Speech Recognition | Education | Language Model | LREC 2008 | Speech Recognition System |

claim paper

» Webassisted annotation semantic indexing and search of television and radio news

» Text Editing for Lecture Speech Archiving on the Web

» Improving speech playback using timecompression and speech recognition

» Distributed Listening A Parallel Processing Approach to Automatic Speech Recognition

» Interactive visualisation techniques for dynamic speech transcription correction and train...

» NLGbAse A Free Linguistic Resource for Natural Language Processing Systems

» Leveraging multiple query logs to improve language models for spoken query recognition

» Contemporaneous text as sideinformation in statistical language modeling

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot

Comments (0)

Sciweavers

On the Use of Web Resources and Natural Language Processing Techniques to Improve Automatic Speech Recognition Systems

Automatic Speech Recognition | Education | Language Model | LREC 2008 | Speech Recognition System |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers