Thai Broadcast News Corpus Construction and Evaluation

15 years 8 months ago

Download www.lrec-conf.org

Large speech and text corpora are crucial to the development of a state-of-the-art speech recognition system. This paper reports on the construction and evaluation of the first Thai broadcast news speech and text corpora. Specifications and conventions used in the transcription process are described in the paper. The speech corpus contains about 17 hours of speech data while the text corpus was transcribed from around 35 hours of television broadcast news. The characteristics of the corpus were analyzed and shown in the paper. The speech corpus was split according to the evaluation focus condition used in the DARPA Hub-4 evaluation. An 18k-word Thai speech recognition system was setup to test with this speech corpus as a preliminary experiment. Acoustic model adaptations were performed to improve the system performance. The best system yielded a word error rate of about 20% for clean and planned speech, and below 30% for the overall condition.

Markpong Jongtaveesataporn, Chai Wutiwiwatchai, Ko

Real-time Traffic

Education | LREC 2008 | Speech Corpus | Speech Recognition | Speech Recognition System |

claim paper

» DiSCo A German Evaluation Corpus for Challenging Problems in the Broadcast Domain

» Discourse Cues for Broadcast News Segmentation

» The EPAC Corpus Manual and Automatic Annotations of Conversational Speech in French Broadc...

» Reduction of Dutch Sentences for Automatic Subtitling

» Measuring novelty and redundancy with multiple modalities in crosslingual broadcast news

» Creating a PersianEnglish Comparable Corpus

» Contemporaneous text as sideinformation in statistical language modeling

» Reliable Measures for Aligning JapaneseEnglish News Articles and Sentences

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Markpong Jongtaveesataporn, Chai Wutiwiwatchai, Koji Iwano, Sadaoki Furui

Comments (0)

Sciweavers

Thai Broadcast News Corpus Construction and Evaluation

Education | LREC 2008 | Speech Corpus | Speech Recognition | Speech Recognition System |

Explore & Download

Productivity Tools

Sciweavers