On Precision and Recall of Multi-Attribute Data Extraction from Semistructured Sources

14 years 8 months ago

Download www.ai.sri.com

Machine learning techniques for data extraction from semistructured sources exhibit different precision and recall characteristics. However to date the formal relationship between learning algorithms and their impact on these two metrics remains unexplored. This paper proposes a formalization of precision and recall of extraction and investigates the complexity-theoretic aspects of learning algorithms for multi-attribute data extraction based on this formalism. We show that there is a tradeoff between precision/recall of extraction and computational efﬁciency and present experimental results to demonstrate the practical utility of these concepts in designing scalable data extraction algorithms for improving recall without compromising on precision.

Guizhen Yang, Saikat Mukherjee, I. V. Ramakrishnan

Real-time Traffic

Data Extraction | Data Extraction Algorithms | Data Mining | ICDM 2003 | Learning Algorithms |

claim paper

Post Info
More Details (n/a)

Added	04 Jul 2010
Updated	04 Jul 2010
Type	Conference
Year	2003
Where	ICDM
Authors	Guizhen Yang, Saikat Mukherjee, I. V. Ramakrishnan

Comments (0)

Sciweavers

On Precision and Recall of Multi-Attribute Data Extraction from Semistructured Sources

Data Extraction | Data Extraction Algorithms | Data Mining | ICDM 2003 | Learning Algorithms |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers