XML is by now the de facto standard for exporting and exchanging data on the web. The need for querying XML data sources whose structure is not fully known to the user and the need...
As the number of available Web pages grows, users experience increasing difficulty finding documents relevant to their interests. One of the underlying reasons for this is that mo...
: This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thum...
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...