To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
We assess a family of ranking mechanisms for search engines based on linkage analysis using a carefully engineered subset of the World Wide Web, WT10g (Bailey, Craswell and Hawking...
Background: The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data min...
This research investigated the application of techniques successfully used in previous information retrieval research, to the more challenging area of medical informatics. It was ...
Andrea L. Houston, Hsinchun Chen, Bruce R. Schatz,...
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource d...
Soumen Chakrabarti, Martin van den Berg, Byron Dom