Declarative Information Extraction Using Datalog with Embedded Extraction Predicates

15 years 7 months ago

Download pages.cs.wisc.edu

In this paper we argue that developing information extraction (IE) programs using Datalog with embedded procedural extraction predicates is a good way to proceed. First, compared to current ad-hoc composition using, e.g., Perl or C++, Datalog provides a cleaner and more powerful way to compose small extraction modules into larger programs. Thus, writing IE programs this way retains and enhances the important advantages of current approaches: programs are easy to understand, debug, and modify. Second, once we write IE programs in this framework, we can apply query optimization techniques to them. This gives programs that, when run over a variety of data sets, are more efficient than any monolithic program because they are optimized based on the statistics of the data on which they are invoked. We show how optimizing such programs raises challenges specific to text data that cannot be accommodated in the current relational optimization framework, then provide initial solutions. Extensiv...

Warren Shen, AnHai Doan, Jeffrey F. Naughton, Ragh

Real-time Traffic

Database | VLDB 2007 |

claim paper

» Declarative analysis of noisy information networks

» TimeTrails A System for Exploring SpatioTemporal Information in Documents

» A concise XML binding framework facilitates practical objectoriented document engineering

Post Info
More Details (n/a)

Added	05 Dec 2009
Updated	05 Dec 2009
Type	Conference
Year	2007
Where	VLDB
Authors	Warren Shen, AnHai Doan, Jeffrey F. Naughton, Raghu Ramakrishnan

Comments (0)

Sciweavers

Declarative Information Extraction Using Datalog with Embedded Extraction Predicates

Database | VLDB 2007 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers