The World Wide Web can be viewed as a gigantic distributed database including millions of interconnected hosts some of which publish information via web servers or peer-to-peer sys...
Lixto is a system and method for the visual and interactive generation of wrappers for Web pages under the supervision of a human developer, for automatically extracting informatio...
Our KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an autonomous, domain...
Oren Etzioni, Michael J. Cafarella, Doug Downey, A...
The Web is based on a browsing paradigm that makes it di cult to retrieve and integrate data from multiple sites. Today, the only way to do this is to build specialized applicatio...
We propose two methods for constructing automated programs for extraction of information from a class of web pages that are very common and of high practical significance - varia...