We consider the problem of efficiently sampling Web search engine query results. In turn, using a small random sample instead of the full set of results leads to efficient approxi...
Aris Anagnostopoulos, Andrei Z. Broder, David Carm...
Automatic classification of web pages is an effective way to deal with the difficulty of retrieving information from the Internet. Although there are many automatic classification...
The World-Wide Web consists not only of a huge number of unstructured texts, but also a vast amount of valuable structured data. Web tables [2] are a typical type of structured in...
Cindy Xide Lin, Bo Zhao, Tim Weninger, Jiawei Han,...
Implicitly structured content on the Web such as HTML tables and lists can be extremely valuable for web search, question answering, and information retrieval, as the implicit str...
In this paper, we present a distributed Web service discovery architecture that is designed to be reliable, flexible and scalable. The architecture is based on the concept of dis...
Brahmananda Sapkota, Dumitru Roman, Sebastian Rysz...