Sciweavers

VLDB
2007
ACM

Declarative Information Extraction Using Datalog with Embedded Extraction Predicates

15 years 16 days ago
Declarative Information Extraction Using Datalog with Embedded Extraction Predicates
In this paper we argue that developing information extraction (IE) programs using Datalog with embedded procedural extraction predicates is a good way to proceed. First, compared to current ad-hoc composition using, e.g., Perl or C++, Datalog provides a cleaner and more powerful way to compose small extraction modules into larger programs. Thus, writing IE programs this way retains and enhances the important advantages of current approaches: programs are easy to understand, debug, and modify. Second, once we write IE programs in this framework, we can apply query optimization techniques to them. This gives programs that, when run over a variety of data sets, are more efficient than any monolithic program because they are optimized based on the statistics of the data on which they are invoked. We show how optimizing such programs raises challenges specific to text data that cannot be accommodated in the current relational optimization framework, then provide initial solutions. Extensiv...
Warren Shen, AnHai Doan, Jeffrey F. Naughton, Ragh
Added 05 Dec 2009
Updated 05 Dec 2009
Type Conference
Year 2007
Where VLDB
Authors Warren Shen, AnHai Doan, Jeffrey F. Naughton, Raghu Ramakrishnan
Comments (0)