Sciweavers

EPIA
2003
Springer

Automatic Selection of Table Areas in Documents for Information Extraction

14 years 4 months ago
Automatic Selection of Table Areas in Documents for Information Extraction
: information contained in companies’ financial statements is valuable for decision making at various levels. Much of the relevant information in such documents is contained in tables and is currently extracted mainly by hand. We propose a method that accomplishes a preliminary step of the task of automatically extracting information from tables in documents: selecting the lines of the document which are likely to belong to the tables containing the information to be extracted. Our method has been developed by empirically analyzing a set of Portuguese companies’ financial statements, using statistical and data mining techniques. Empirical evaluation indicates that more than 99% of the table lines are selected after discarding at least 50% of them. The method can cope with the complexity of styles used in assembling information on paper and adapt its performance accordingly, thus maximizing its results.
Ana Costa e Silva, Alípio Jorge, Luí
Added 06 Jul 2010
Updated 06 Jul 2010
Type Conference
Year 2003
Where EPIA
Authors Ana Costa e Silva, Alípio Jorge, Luís Torgo
Comments (0)