Sciweavers

EMNLP
2009

Polylingual Topic Models

13 years 9 months ago
Polylingual Topic Models
Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive collections of interlinked documents in dozens of languages, such as Wikipedia, are now widely available, calling for tools that can characterize content in many languages. We introduce a polylingual topic model that discovers topics aligned across multiple languages. We explore the model's characteristics using two large corpora, each with over ten different languages, and demonstrate its usefulness in supporting machine translation and tracking topic trends across languages.
David M. Mimno, Hanna M. Wallach, Jason Naradowsky
Added 17 Feb 2011
Updated 17 Feb 2011
Type Journal
Year 2009
Where EMNLP
Authors David M. Mimno, Hanna M. Wallach, Jason Naradowsky, David A. Smith, Andrew McCallum
Comments (0)