Sciweavers

KDD
2000
ACM

IntelliClean: a knowledge-based intelligent data cleaner

14 years 4 months ago
IntelliClean: a knowledge-based intelligent data cleaner
Existing data cleaning methods work on the basis of computing the degree of similarity between nearby records in a sorted database. High recall is achieved by accepting records with low degrees of similarity as duplicates, at the cost of lower precision. High precision is achieved analogously at the cost of lower recall. This is the recall-precision dilemma. In this paper, we propose a generic knowledge-based framework for e ective data cleaning that implements existing cleaning strategies and more. We develop a new method to compute transitive closure under uncertaintywhich handles the merging of groups of inexact duplicate records. Experimental results show that this framework can identify duplicates and anomalies with high recall and precision.
Mong-Li Lee, Tok Wang Ling, Wai Lup Low
Added 25 Aug 2010
Updated 25 Aug 2010
Type Conference
Year 2000
Where KDD
Authors Mong-Li Lee, Tok Wang Ling, Wai Lup Low
Comments (0)