Scaling Out the Discovery of Inclusion Dependencies

10 years 3 months ago

Download subs.emis.de

Abstract: Inclusion dependencies are among the most important database dependencies. In addition to their most prominent application – foreign key discovery – inclusion dependencies are an important input to data integration, query optimization, and schema redesign. With their discovery being a recurring data proﬁling task, previous research has proposed different algorithms to discover all inclusion dependencies within a given dataset. However, none of the proposed algorithms is designed to scale out, i.e., none can be distributed across multiple nodes in a computer cluster to increase the performance. So on large datasets with many inclusion dependencies, these algorithms can take days to complete, even on high-performance computers. We introduce SINDY, an algorithm that efﬁciently discovers all unary inclusion dependencies of a given relational dataset in a distributed fashion and that is not tied to main memory requirements. We give a practical implementation of SINDY that ...

Sebastian Kruse, Thorsten Papenbrock, Felix Nauman

Real-time Traffic

BTW 2015 | Database |

claim paper

Post Info
More Details (n/a)

Added	17 Apr 2016
Updated	17 Apr 2016
Type	Journal
Year	2015
Where	BTW
Authors	Sebastian Kruse, Thorsten Papenbrock, Felix Naumann

Comments (0)

Sciweavers

Scaling Out the Discovery of Inclusion Dependencies

BTW 2015 | Database |

Explore & Download

Productivity Tools

Sciweavers