We present two machine learning approaches to information extraction from semi-structured documents that can be used if no annotated training data are available, but there does ex...
In this paper, we propose a new user interface to interactively specify Web wrappers to extract relational information from Web documents. In this study, we focused on improving u...
Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
Modern critical editions of ancient works generally include manually created indices of other sources quoted in the text. Since indices can be considered as a form of domain speci...
Jedi (Java based Extraction and Dissemination of Information) is a lightweight tool for the creation of wrappers and mediators to extract, combine, and reconcile information from ...
Gerald Huck, Peter Fankhauser, Karl Aberer, Erich ...