Sciweavers

DEBU
2000

Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach

13 years 11 months ago
Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach
A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web sites and transforming it into a structured data format, such as XML. The resulting data can then be used to build new applications without having to deal with unstructured data. The advantages of our wrapping technology over previous work are the the ability to learn highly accurate extraction rules, to verify the wrapper to ensure that the correct data continues to be extracted, and to automatically adapt to changes in the sites from which the data is being extracted.
Craig A. Knoblock, Kristina Lerman, Steven Minton,
Added 18 Dec 2010
Updated 18 Dec 2010
Type Journal
Year 2000
Where DEBU
Authors Craig A. Knoblock, Kristina Lerman, Steven Minton, Ion Muslea
Comments (0)