We describe a methodology for retrieving document images from large extremely diverse collections. First we perform content extraction, that is the location and measurement of reg...
We explore several different document representation models and two query expansion models for the task of recommending blogs to a user in response to a query. Blog relevance rank...
Jaime Arguello, Jonathan L. Elsas, Jamie Callan, J...
This paper proposes a novel method of building polarity-tagged corpus from HTML documents. The characteristics of this method is that it is fully automatic and can be applied to a...
In this paper, we introduce the concept of "user policies" and its applications to the browsing of HTML documents. The objective of policies is to specify user preferenc...
The size of a document archive is a very important parameter for resource selection in distributed information retrieval systems. In this paper, we present a method for automatical...