Recent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchron...
Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey...
In this paper, we present an extension of PHIL, a declarative language for filtering information from XML data. The proposed approach allows us to extract relevant data as well a...
Text categorization and retrieval tasks are often based on a good representation of textual data. Departing from the classical vector space model, several probabilistic models have...
The interpretation of natural scenes, generally so obvious and effortless for humans, still remains a challenge in computer vision. To allow the search of image-based documents i...
Okapi BM25 scoring of anchor text surrogate documents has been shown to facilitate effective ranking in navigational search tasks over web data. We hypothesize that even better r...