Researchers investigating personalization techniques for Web Information Retrieval face a challenge; that the data required to perform evaluations, namely query logs and clickthro...
Highly heterogeneous XML data collections that do not have a global schema, as arising, for example, in federations of digital libraries or scientific data repositories, cannot be...
The Distributed Information Search COmponent (Disco) is a prototype heterogeneous distributed database that accesses underlying data sources. The Disco prototype currently focuses...
Online offerings such as web search, news portals, and e-commerce applications face the challenge of providing high-quality service to a large, heterogeneous user base. Recent eff...
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...