Sciweavers

ADBIS
2003
Springer

Using Common Schemas for Information Extraction from Heterogeneous Web Catalogs

14 years 5 months ago
Using Common Schemas for Information Extraction from Heterogeneous Web Catalogs
The Web has become the world’s largest information source. Unfortunately, the main success factor of the Web, the inherent principle of distribution and autonomy of the participants, is also its main problem. When trying to make this information machine processable, common structures and semantics have to be identified. The goal of information extraction (IE) is exactly this, to transform text into a structural format. In this paper, we present a novel approach for information extraction developed as part of the XI³ project. Central to our approach is the assumption that we can obtain a better understanding of a text fragment if we consider its integration into higher-level concepts by exploiting text fragments from different parts of a source. In addition to previous approaches, we offer higher expressiveness of the extraction schema and an advanced method to deal with ambiguous texts. With our approach we solve one of the main challenges of information extraction, providing a way ...
Richard Vlach, Wassili Kazakos
Added 06 Jul 2010
Updated 06 Jul 2010
Type Conference
Year 2003
Where ADBIS
Authors Richard Vlach, Wassili Kazakos
Comments (0)