We propose two methods for constructing automated programs for extraction of information from a class of web pages that are very common and of high practical significance - varia...
Machine-generated documents containing semi-structured text are rapidly forming the bulk of data being stored in an organisation. Given a feature-based representation of such data,...
This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifi...
There is extensive interest in automating the collection, organization and summarization of biological data. Data in the form of figures and accompanying captions in literature pr...
Abstract. Information Extraction, the process of eliciting data from natural language documents, usually relies on the ability to parse the document and then to detect the meaning ...