Sciweavers

PVLDB
2010

Sampling the Repairs of Functional Dependency Violations under Hard Constraints

13 years 9 months ago
Sampling the Repairs of Functional Dependency Violations under Hard Constraints
Violations of functional dependencies (FDs) are common in practice, often arising in the context of data integration or Web data extraction. Resolving these violations is known to be challenging for a variety of reasons, one of them being the exponential number of possible “repairs”. Previous work has tackled this problem either by producing a single repair that is (nearly) optimal with respect to some metric, or by computing consistent answers to selected classes of queries without explicitly generating the repairs. In this paper, we propose a novel data cleaning approach that is not limited to finding a single repair or to a particular class of queries, namely, sampling from the space of possible repairs. We give several motivating scenarios where sampling from the space of FD repairs is desirable, propose a new class of useful repairs, and present an algorithm that randomly samples from this space. We also show how to restrict the space of generated repairs based on user-defi...
George Beskales, Ihab F. Ilyas, Lukasz Golab
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Where PVLDB
Authors George Beskales, Ihab F. Ilyas, Lukasz Golab
Comments (0)