Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive colle...
David M. Mimno, Hanna M. Wallach, Jason Naradowsky...
Many of the documents in large text collections are duplicates and versions of each other. In recent research, we developed new methods for finding such duplicates; however, as the...
We investigate three issues in distributed information retrieval, considering both TREC data and U.S. Patents: (1) topical organization of large text collections, (2) collection r...
Leah S. Larkey, Margaret E. Connell, James P. Call...
In this paper, we describe a new approach for mining concept associations from large text collections. The concepts are short sequences of words that occur frequently together acr...