We propose a preprocessing method for Web mining which, given semi-structured documents with the same structure and style, distinguishes useless parts and non-useless parts in each...
We present a general framework for the task of extracting specific information “on demand” from a large corpus such as the Web under resource-constraints. Given a database wit...
In developing an Infbrmation Extraction tIE) system tbr a new class of events or relations, one of the major tasks is identifying the many ways in which these events or relations ...
Roman Yangarber, Ralph Grishman, Pasi Tapanainen, ...
Entity Recognition (ER) is a key component of relation extraction systems and many other natural-language processing applications. Unfortunately, most ER systems are restricted to...
In this paper we introduce a programming language for Web document processing called WebL. WebL is a high level, object-oriented scripting language that incorporates two novel fea...