Data Position and Profiling in Domain-Independent Warehouse Cleaning

16 years 3 days ago

Download cs.uwindsor.ca

: A major problem that arises from integrating different databases is the existence of duplicates. Data cleaning is the process for identifying two or more records within the database, which represent the same real world object (duplicates), so that a unique representation for each object is adopted. Existing data cleaning techniques rely heavily on full or partial domain knowledge. This paper proposes a positional algorithm that achieves domain independent de-duplication at the attribute level. The paper also proposes a technique for ﬁeld weighting through data proﬁling, which, when used with the positional algorithm, achieves domain-independent cleaning at the record level. Experiments show that the positional algorithm achieves more accurate de-duplication than existing algorithms.

Christie I. Ezeife, Ajumobi Udechukwu

Real-time Traffic

Data Cleaning | Domain Independent De-duplication | ICEIS 2003 | Information Systems | Positional Algorithm |

claim paper

Added	04 Jul 2010
Updated	04 Jul 2010
Type	Conference
Year	2003
Where	ICEIS
Authors	Christie I. Ezeife, Ajumobi Udechukwu

Sciweavers

Data Position and Profiling in Domain-Independent Warehouse Cleaning

Data Cleaning | Domain Independent De-duplication | ICEIS 2003 | Information Systems | Positional Algorithm |

Explore & Download

Productivity Tools

Sciweavers