

Contemporaneous text as side-information in statistical language modeling

14 years 22 days ago
Contemporaneous text as side-information in statistical language modeling
We propose new methods to exploit contemporaneous text, such as on-line news articles, to improve language models for automatic speech recognition and other natural language processing applications. In particular, we investigate the use of text from a resource-rich language to sharpen language models for processing a news story or article in a language with scarce linguistic resources. We demonstrate that even with fairly crude cross-language information retrieval and simple machine translation, one can construct story-specific Chinese language models which exploit cues from a side-corpus of English newswire to significantly improve the performance of language models estimated from a static Chinese corpus. Our investigations cover cases when the amount of available Chinese text is small, and a case when a large Chinese text corpus is available. We examine the effectiveness of our techniques both when the side-corpus contains English documents that are near-translations of the Chinese ...
Sanjeev Khudanpur, Woosung Kim
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2004
Where CSL
Authors Sanjeev Khudanpur, Woosung Kim
Comments (0)