A significant portion of the world's text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages h...
Daniel Ramage, David Hall, Ramesh Nallapati, Chris...
The aim of query-based sampling is to obtain a sufficient, representative sample of an underlying (text) collection. Current measures for assessing sample quality are too coarse gr...
Topic-based text summaries promise to help average users quickly understand a text collection and derive insights. Recent research has shown that the Latent Dirichlet Allocation (...
In this paper, we investigate the multimodal nature of cell phone data in terms of discovering recurrent and rich patterns in people’s lives. We present a method that can discov...
Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in...