HTML Pattern Generator--Automatic Data Extraction from Web Pages

16 years 18 days ago

Download www.informatik.tu-cottbus.de

Existing methods of information extraction from HTML documents include manual approach, supervised learning and automatic techniques. The manual method has high precision and recall values but it is difﬁcult to apply it for large number of pages. Supervised learning involves human interaction to create positive and negative samples. Automatic techniques beneﬁt from less human effort but they are not highly reliable regarding the information retrieved.

Mirel Cosulschi, Adrian Giurca, Bogdan Udrescu, Ni

Real-time Traffic

Automatic Techniques | Manual Approach | Supervised Learning | SYNASC 2006 |

claim paper

Related Content

» Data Extraction from Web Data Sources

» Title extraction from bodies of HTML documents and its application to web page retrieval

» Using visual cues for extraction of tabular data from arbitrary HTML documents

» WebSets extracting sets of entities from the web using unsupervised information extraction

» RoadRunner Towards Automatic Data Extraction from Large Web Sites

» An XML Approach to Semantically Extract Data from HTML Tables

» Web page title extraction and its application

» Web Ecology Recycling HTML Pages as XML Documents Using W4F

» From HTTP to HTML ErlangOTP experiences in web based service applications

Post Info
More Details (n/a)

Added	12 Jun 2010
Updated	12 Jun 2010
Type	Conference
Year	2006
Where	SYNASC
Authors	Mirel Cosulschi, Adrian Giurca, Bogdan Udrescu, Nicolae Constantinescu, Mihai Gabroveanu

Comments (0)

Sciweavers

HTML Pattern Generator--Automatic Data Extraction from Web Pages

Automatic Techniques | Manual Approach | Supervised Learning | SYNASC 2006 |

Explore & Download

Productivity Tools

Sciweavers