A Two-Step Classification Approach to Unsupervised Record Linkage

15 years 11 months ago

Download crpit.com

Linking or matching databases is becoming increasingly important in many data mining projects, as linked data can contain information that is not available otherwise, or that would be too expensive to collect manually. A main challenge when linking large databases is the classification of the compared record pairs into matches and non-matches. In traditional record linkage, classification thresholds have to be set either manually or using an EM-based approach. More recently developed classification methods are mainly based on supervised machine learning techniques and thus require training data, which is often not available in real world situations or has to be prepared manually. In this paper, a novel two-step approach to record pair classification is presented. In a first step, example training data of high quality is generated automatically, and then used in a second step to train a supervised classifier. Initial experimental results on both real and synthetic data show that this a...

Peter Christen

Real-time Traffic

AUSDM 2007 | Data Mining | Record Pair | Traditional Record Linkage | Training Data |

claim paper

» UREST an unsupervised record extraction system

» Multivariate Stream Data Classification Using Simple Text Classifiers

» Complex Human Activity Recognition for Monitoring Wide Outdoor Environments

» 3D Object Detection Using a Fast VoxelWise Local Spherical Fourier Tensor Transformation

Post Info
More Details (n/a)

Added	12 Aug 2010
Updated	12 Aug 2010
Type	Conference
Year	2007
Where	AUSDM
Authors	Peter Christen

Comments (0)

Sciweavers

A Two-Step Classification Approach to Unsupervised Record Linkage

AUSDM 2007 | Data Mining | Record Pair | Traditional Record Linkage | Training Data |

Explore & Download

Productivity Tools

Sciweavers