One of the most fundamental problems in web search is how to re-rank result web pages based on user logs. Most traditional models for re-ranking assume each query has a single int...
The AutoFeed system automatically extracts data from semistructured web sites. Previously, researchers have developed two types of supervised learning approaches for extracting we...
A significant portion of the world's text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages h...
Daniel Ramage, David Hall, Ramesh Nallapati, Chris...
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
With the advent of Web 2.0 technologies, websites have evolved from static pages to dynamic, interactive Web-based applications with the ability to replicate common desktop functi...
Joshua M. Hailpern, Loretta Guarino Reid, Richard ...