

Extracting Relations from XML Documents

14 years 5 months ago
Extracting Relations from XML Documents
XML is becoming a prevalent format for data exchange. Many XML documents have complex schemas that are not always known, and can vary widely between information sources and applications. In contrast, database applications rely mainly on the flat relational model. We propose a novel, partially supervised approach for extracting userdefined relations from XML documents with unknown schema. The extracted relations can be directly used by an RDBMS, or utilized for information integration or data mining tasks. Our method attempts to automatically capture the lexical and structural features that indicate the relevant portions of the input document, based on a few user-annotated examples. This information can then be used to extract the relation of interest from documents with schemas potentially different from the training examples. We present preliminary experiments showing that our method could be capable of extracting the target relation from XML documents even in the presence of signi...
Eugene Agichtein, C. T. Howard Ho, Vanja Josifovsk
Added 06 Jul 2010
Updated 06 Jul 2010
Type Conference
Year 2003
Where ER
Authors Eugene Agichtein, C. T. Howard Ho, Vanja Josifovski, Joerg Gerhardt
Comments (0)