Sciweavers

ICDM
2006
IEEE

Unsupervised Learning of Tree Alignment Models for Information Extraction

14 years 6 months ago
Unsupervised Learning of Tree Alignment Models for Information Extraction
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level data mining and information exploitation. Our algorithm effectively combines tree and string alignment algorithms, as well as domain-specific feature extraction to match semantically related data across search results. The applications of our approach are vast and include hidden web crawling, semantic tagging, and federated search. We build on earlier research on the use of tree alignment for information extraction. In contrast to previous approaches that rely on hand tuned parameters, our algorithm makes use of a variant of Support Vector Machines (SVMs) to learn a parameterized, site-independent tree alignment model. This model can then be used to deduce common structural and textual elements of a set of HTML parse trees. We report some preliminary results of our system’s performance on data from websit...
Philip Zigoris, Damian Eads, Yi Zhang
Added 11 Jun 2010
Updated 11 Jun 2010
Type Conference
Year 2006
Where ICDM
Authors Philip Zigoris, Damian Eads, Yi Zhang
Comments (0)