Sciweavers

CLEANDB
2006
ACM

Circumventing Data Quality Problems Using Multiple Join Paths

14 years 5 months ago
Circumventing Data Quality Problems Using Multiple Join Paths
We propose the Multiple Join Path (MJP) framework for obtaining high quality information by linking fields across multiple databases, when the underlying databases have poor quality data, which are characterized by violations of integrity constraints like keys and functional dependencies within and across databases. MJP associates quality scores with candidate answers by first scoring individual data paths between a pair of field values taking into account data quality with respect to specified integrity constraints, and then agglomerating scores across multiple data paths that serve as corroborating evidences for a candidate answer. We address the problem of finding the top-few (highest quality) answers in the MJP framework using novel techniques, and demonstrate the utility of our techniques using real data and our Virtual Integration Prototype testbed.
Yannis Kotidis, Amélie Marian, Divesh Sriva
Added 13 Jun 2010
Updated 13 Jun 2010
Type Conference
Year 2006
Where CLEANDB
Authors Yannis Kotidis, Amélie Marian, Divesh Srivastava
Comments (0)