Pseudo-relevance feedback (PRF) via query-expansion has been proven to be effective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from...
XML is increasingly becoming the format of choice for information exchange on the Internet. As this trend grows, one can expect that documents (or collections thereof) may get qui...
Premkumar T. Devanbu, Michael Gertz, April Kwong, ...
Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive colle...
David M. Mimno, Hanna M. Wallach, Jason Naradowsky...
Abstract. Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval...
A family of probabilistic time series models is developed to analyze the time evolution of topics in large document collections. The approach is to use state space models on the n...