In this paper we address the problem of organizing hidden-Web databases. Given a heterogeneous set of Web forms that serve as entry points to hidden-Web databases, our goal is to ...
A seed-based framework for textual information extraction allows for weakly supervised acquisition of open-domain class attributes over conceptual hierarchies, from a combination ...
This paper provides an explanation of the basic data structures used in a new page analysis technique to create wrappers (data extractors) for the result pages produced by web sit...
This paper presents a novel information system integrating advanced information extraction technology and automatic hyper-linking. Extracted entities are mapped into a domain onto...
Stephan Busemann, Witold Drozdzynski, Hans-Ulrich ...
: Knowledge about market developments and competitor activities on the market becomes more and more a critical success factor for enterprises. The World Wide Web provides public do...