Unsupervised Learning of Tree Alignment Models for Information Extraction

14 years 6 months ago

Download users.soe.ucsc.edu

We propose an algorithm for extracting ﬁelds from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level data mining and information exploitation. Our algorithm effectively combines tree and string alignment algorithms, as well as domain-speciﬁc feature extraction to match semantically related data across search results. The applications of our approach are vast and include hidden web crawling, semantic tagging, and federated search. We build on earlier research on the use of tree alignment for information extraction. In contrast to previous approaches that rely on hand tuned parameters, our algorithm makes use of a variant of Support Vector Machines (SVMs) to learn a parameterized, site-independent tree alignment model. This model can then be used to deduce common structural and textual elements of a set of HTML parse trees. We report some preliminary results of our system’s performance on data from websit...

Philip Zigoris, Damian Eads, Yi Zhang

Real-time Traffic

Algorithm | Data Mining | ICDM 2006 | String Alignment Algorithms | Tree Alignment |

claim paper

Post Info
More Details (n/a)

Added	11 Jun 2010
Updated	11 Jun 2010
Type	Conference
Year	2006
Where	ICDM
Authors	Philip Zigoris, Damian Eads, Yi Zhang

Comments (0)

Sciweavers

Unsupervised Learning of Tree Alignment Models for Information Extraction

Algorithm | Data Mining | ICDM 2006 | String Alignment Algorithms | Tree Alignment |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers