Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank...
Oren Etzioni, Michael J. Cafarella, Doug Downey, S...
Easy reuse and integration of declaratively described information in a distributed setting is one of the main motivations for building the Semantic Web. Despite of this claim, reu...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Static index pruning techniques aim at removing from the posting lists of an inverted file the references to documents which are likely to be not relevant for answering user querie...
In Web 2.0, users have generated and shared massive amounts of resources in various media formats, such as news, blogs, audios, photos and videos. The abundance and diversity of t...
Chen Liu, Beng Chin Ooi, Anthony K. H. Tung, Dongx...