The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
Mailing list archives (i.e., the compilation of the messages posted up-to-now) are often published on the web and indexed by conventional search engines.They store a vast knowledg...
We present a novel language modeling approach to capturing the query reformulation behavior of Web search users. Based on a framework that categorizes eight different types of “...
In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the p...
The recent explosion of on-line information in Digital Libraries and on the World Wide Web has given rise to a number of query-based search engines and manually constructed topica...
Mehran Sahami, Salim Yusufali, Michelle Q. Wang Ba...