Research and development of information access technology for scanned paper documents has been hampered by the lack of public test collections of realistic scope and complexity. A...
David D. Lewis, Gady Agam, Shlomo Argamon, Ophir F...
The aim of this study is to investigate whether element retrieval (as opposed to full-text retrieval) is meaningful and useful for searchers when carrying out information-seeking ...
User modeling for information retrieval has mostly been studied to improve the effectiveness of information access in centralized repositories. In this paper we explore user model...
Content-targeted advertising, the task of automatically associating ads to a Web page, constitutes a key Web monetization strategy nowadays. Further, it introduces new challenging...
We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform...
Text clustering is most commonly treated as a fully automated task without user feedback. However, a variety of researchers have explored mixed-initiative clustering methods which...
This paper employs ConceptNet, which covers a rich set of commonsense concepts, to retrieve images with text descriptions by focusing on spatial relationships. Evaluation on test ...
This paper studies the problem of identifying comparative sentences in text documents. The problem is related to but quite different from sentiment/opinion sentence identification...
New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we pre...
Jiwoon Jeon, W. Bruce Croft, Joon Ho Lee, Soyeon P...
Broder et al.’s [3] shingling algorithm and Charikar’s [4] random projection based approach are considered “state-of-theart” algorithms for finding near-duplicate web pag...