Sciweavers

SIGIR
2004
ACM

Language-specific models in multilingual topic tracking

14 years 5 months ago
Language-specific models in multilingual topic tracking
Topic tracking is complicated when the stories in the stream occur in multiple languages. Typically, researchers have trained only English topic models because the training stories have been provided in English. In tracking, non-English test stories are then machine translated into English to compare them with the topic models. We propose a native language hypothesis stating that comparisons would be more effective in the original language of the story. We first test and support the hypothesis for story link detection. For topic tracking the hypothesis implies that it should be preferable to build separate language-specific topic models for each language in the stream. We compare different methods of incrementally building such native language topic models. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing – Indexing methods, Linguistic processing. General Terms: Algorithms, Experimentation.
Leah S. Larkey, Fangfang Feng, Margaret E. Connell
Added 30 Jun 2010
Updated 30 Jun 2010
Type Conference
Year 2004
Where SIGIR
Authors Leah S. Larkey, Fangfang Feng, Margaret E. Connell, Victor Lavrenko
Comments (0)