Abstract The Internet and the Web are increasingly used to disseminate fast changing data such as sensor data, traffic and weather information, stock prices, sports scores, and eve...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Abstract—Traditionally, performance has been the most important metrics when evaluating a system. However, in the last decades industry and academia have been paying increasing a...
The dominant existing routing strategies employed in peerto-peer(P2P) based information retrieval(IR) systems are similarity-based approaches. In these approaches, agents depend o...
The AutoFeed system automatically extracts data from semistructured web sites. Previously, researchers have developed two types of supervised learning approaches for extracting we...