Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

51

KDD
2002
ACM

favoriteEmaildiscussreport

93views Data Mining» more KDD 2002»

Interactive deduplication using active learning

15 years 7 months ago

Interactive deduplication using active learning

Download www.it.iitb.ac.in

Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer to the same entity in spite of various data inconsistencies. Most existing systems use hand-coded functions. One way to overcome the tedium of hand-coding is to train a classifier to distinguish between duplicates and non-duplicates. The success of this method critically hinges on being able to provide a covering and challenging set of training pairs that bring out the subtlety of the deduplication function. This is non-trivial because it requires manually searching for various data inconsistencies between any two records spread apart in large lists. We present our design of a learning-based deduplication system that uses a novel method of interactively discovering challenging training pairs using active learning. Our experiments on real-life datasets show that active learning significantly reduces the number ...

Sunita Sarawagi, Anuradha Bhamidipaty

Real-time Traffic

Challenging Training Pairs | Data Mining | Deduplication Function | KDD 2002 | Various Data Inconsistencies |

claim paper

Related Content

» ALIAS An Active Learning led Interactive Deduplication System

» Scaling up the ALIAS Duplicate Elimination System

» Active Learning Genetic programming for record deduplication

» From active towards InterActive learning using consideration information to improve labeli...

» Twoperson interaction detection using bodypose features and multiple instance learning

» Beyond Active Noun Tagging Modeling Contextual Interactions for MultiClass Active Learning

» Active learning for human proteinprotein interaction prediction

» Leveraging Active Learning for Relevance Feedback Using an Information Theoretic Diversity...

» Mystery in the museum collaborative learning activities using handheld devices

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2002
Where	KDD
Authors	Sunita Sarawagi, Anuradha Bhamidipaty

Comments (0)