We show that a set of independently developed spam filters may be combined in simple ways to provide substantially better filtering than any of the individual filters. The resu...
Thomas R. Lynam, Gordon V. Cormack, David R. Cheri...
(LP)2 is a covering algorithm for adaptive Information Extraction from text (IE). It induces symbolic rules that insert SGML tags into texts by learning from examples found in a u...
Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For exampl...
Modern critical editions of ancient works generally include manually created indices of other sources quoted in the text. Since indices can be considered as a form of domain speci...
This paper presents a method of automatically constructing information extraction patterns on predicate-argument structures (PASs) obtained by full parsing from a smaller training...