Sciweavers

CIMCA
2005
IEEE

Improving Rule Generation Precision for Domain Knowledge based Wrappers

14 years 5 months ago
Improving Rule Generation Precision for Domain Knowledge based Wrappers
Wrappers play an important role in extracting specified information from various sources. Wrapper rules by which information is extracted are often created from the domain-specific knowledge. Domain-specific knowledge helps recognizing the meaning the text representing various entities and values and detecting their formats. However, such domain knowledge becomes powerless when value-representing data are not labeled with appropriate textual descriptions or there is nothing but a hyper link when certain text labels or values are expected. In order to alleviate these problems, we propose a probabilistic method for recognizing the entity type, i.e. generating wrapper rules, when there is no label associated with value-representing text. In addition, we have devised a method for using the information reachable by following hyperlinks when textual data are not immediately available on the target web page. Our experimental work shows that the proposed methods help increasing precision of t...
Chang-Hoo Jeong, Sung-Jin Jhun, Myung-Eun Lim, Sun
Added 24 Jun 2010
Updated 24 Jun 2010
Type Conference
Year 2005
Where CIMCA
Authors Chang-Hoo Jeong, Sung-Jin Jhun, Myung-Eun Lim, Sung-Hyon Myaeng
Comments (0)