First Broadcast News Transcription System for Khmer Language

15 years 8 months ago

Download www.lrec-conf.org

In this paper we present an overview on the development of a large vocabulary continuous speech recognition (LVCSR) system for Khmer, the official language of Cambodia, spoken by more than 15 million people. As an under-resourced language, develop a LVCSR system for Khmer is a challenging task. We describe our methodologies for quick language data collection and processing for language modeling and acoustic modeling. For language modeling, we investigate the use of word and sub-word as basic modeling unit in order to see the potential of sub-word units in the case of unsegmented language like Khmer. Grapheme-based acoustic modeling is used to quickly build our Khmer language acoustic model. Furthermore, the approaches and tools used for the development of our system are documented and made publicly available on the web. We hope this will contribute to accelerate the development of LVCSR system for a new language, especially for under-resource languages of developing countries where re...

Sopheap Seng, Sethserey Sam, Laurent Besacier, Bri

Real-time Traffic

Acoustic Modeling | Education | Language Modeling | LREC 2008 | Quick Language Data |

claim paper

» Applying a GrammarBased Language Model to a Simplified BroadcastNews Transcription Task

» Webassisted annotation semantic indexing and search of television and radio news

» Integrating Visual Audio and Text Analysis for News Video

» Improved models for Mandarin speechtotext transcription

» Evaluation Protocol and Tools for QuestionAnswering on Speech Transcripts

» Language and variety verification on broadcast news for Portuguese

» Thai Broadcast News Corpus Construction and Evaluation

» Advances in the CMUInteract Arabic GALE Transcription System

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Sopheap Seng, Sethserey Sam, Laurent Besacier, Brigitte Bigi, Eric Castelli

Comments (0)

Sciweavers

First Broadcast News Transcription System for Khmer Language

Acoustic Modeling | Education | Language Modeling | LREC 2008 | Quick Language Data |

Explore & Download

Productivity Tools

Sciweavers