Scaling up the ALIAS Duplicate Elimination System

15 years 2 months ago

Download www.it.iitb.ac.in

Duplicate elimination is an important stage in integrating data from multiple sources. The challenges involved are finding a robust deduplication function that can identify when two records are duplicates and efficiently applying the function on very large lists of records. In ALIAS the task of designing a deduplication function is eased by learning the function from examples of duplicates and nonduplicates and by using active learning to spot such examples effectively [1]. Here we investigate the issues involved in efficiently applying the learnt deduplication system on large lists of records. We demonstrate the working of the ALIAS evaluation engine and highlight the optimizations it uses to significantly cut down the number of record pairs that need to be explicitly materialized.

Sunita Sarawagi, Alok Kirpal

Real-time Traffic

ALIAS Evaluation Engine | Database | ICDE 2003 | Learnt Deduplication | Robust Deduplication Function |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2009
Updated	01 Nov 2009
Type	Conference
Year	2003
Where	ICDE
Authors	Sunita Sarawagi, Alok Kirpal

Comments (0)

Sciweavers

Scaling up the ALIAS Duplicate Elimination System

ALIAS Evaluation Engine | Database | ICDE 2003 | Learnt Deduplication | Robust Deduplication Function |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers