clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a ...
David M. Blei, Thomas L. Griffiths, Michael I. Jor...
As a principled approach to capturing semantic relations of words in information retrieval, statistical translation models have been shown to outperform simple document language m...
Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive colle...
David M. Mimno, Hanna M. Wallach, Jason Naradowsky...
Training statistical models to detect nonnative sentences requires a large corpus of non-native writing samples, which is often not readily available. This paper examines the exte...
We describe a novel approach to machine translation that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed d...