Querying Probabilistic Information Extraction

13 years 10 months ago

Download db.cs.berkeley.edu

Recently, there has been increasing interest in extending relational query processing to include data obtained from unstructured sources. A common approach is to use stand-alone Information Extraction (IE) techniques to identify and label entities within blocks of text; the resulting entities are then imported into a standard database and processed using relational queries. This two-part approach, however, suffers from two main drawbacks. First, IE is inherently probabilistic, but traditional query processing does not properly handle probabilistic data, resulting in reduced answer quality. Second, performance inefﬁciencies arise due to the separation of IE from query processing. In this paper, we address these two problems by building on an in-database implementation of a leading IE model— Conditional Random Fields using the Viterbi inference algorithm. We develop two different query approaches on top of this implementation. The ﬁrst uses deterministic queries over maximumlikeli...

Daisy Zhe Wang, Michael J. Franklin, Minos N. Garo

Real-time Traffic

Probabilistic Query Answers | PVLDB 2010 | Viterbi Algorithm | Viterbi Inference Algorithm |

claim paper

Post Info
More Details (n/a)

Added	30 Jan 2011
Updated	30 Jan 2011
Type	Journal
Year	2010
Where	PVLDB
Authors	Daisy Zhe Wang, Michael J. Franklin, Minos N. Garofalakis, Joseph M. Hellerstein

Comments (0)

Sciweavers

Querying Probabilistic Information Extraction

Probabilistic Query Answers | PVLDB 2010 | Viterbi Algorithm | Viterbi Inference Algorithm |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers