Although many variants of language models have been proposed for information retrieval, there are two related retrieval heuristics remaining “external” to the language modelin...
This paper explores the problem of computing pairwise similarity on document collections, focusing on the application of “more like this” queries in the life sciences domain. ...
With the continuing advances in data storage and communication technology, there has been an explosive growth of music information from different application domains. As an effe...
Bingjun Zhang, Jialie Shen, Qiaoliang Xiang, Ye Wa...
This paper investigates whether Web comments are of descriptive nature, that is, whether the combined text of a set of comments is similar in topic to the commented object. If so,...
This paper addresses the issue of automatically extracting keyphrases from document. Previously, this problem was formalized as classification and learning methods for classific...
Recent work in supervised learning of term-based retrieval models has shown significantly improved accuracy can often be achieved via better model estimation [2, 10, 11, 17]. In ...
Modern techniques for distributed information retrieval use a set of documents sampled from each server, but these samples have been underutilised in server selection. We describe...
Email spam filters are commonly trained on a sample of spam and ham (non-spam) messages. We investigate the effect on filter performance of using samples of spam and ham messag...
We examine two basic sources for implicit relevance feedback on the segment level for search personalization: eye tracking and display time. A controlled study has been conducted ...