Probabilistic Declarative Information Extraction

14 years 11 months ago

Download www.cs.berkeley.edu

Abstract-Unstructured text represents a large fraction of the world's data. It often contain snippets of structured information within them (e.g., people's names and zip codes). Information Extraction (IE) techniques identify such structured information in text. In recent years, database research has pursued IE on two fronts: declarative languages and systems for managing IE tasks, and probabilistic databases for querying the output of IE. In this paper, we make the first steps to merge these two directions, without loss of statistical robustness, by implementing a state-of-the-art statistical IE model ? Conditional Random Fields (CRFs) ? in the setting of a Probabilistic Database that treats statistical models as firstclass data objects. We show that the Viterbi algorithm for CRF inference can be specified declaratively in recursive SQL. We also show the performance benefits relative to a standalone open-source Viterbi implementation. This work opens up the optimization oppo...

Daisy Zhe Wang, Eirinaios Michelakis, Joseph M. He

Real-time Traffic

Database | ICDE 2010 | IE Models | IE Tasks | State-of-the-art Statistical Ie |

claim paper

Post Info
More Details (n/a)

Added	20 Dec 2009
Updated	03 Jan 2010
Type	Conference
Year	2010
Where	ICDE
Authors	Daisy Zhe Wang, Eirinaios Michelakis, Joseph M. Hellerstein, Michael J. Franklin, Minos N. Garofalakis

Comments (0)

Sciweavers

Probabilistic Declarative Information Extraction

Database | ICDE 2010 | IE Models | IE Tasks | State-of-the-art Statistical Ie |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers