Sciweavers

VLDB
2007
ACM

Example-driven design of efficient record matching queries

14 years 11 months ago
Example-driven design of efficient record matching queries
Record matching is the task of identifying records that match the same real world entity. This is a problem of great significance for a variety of business intelligence applications. Implementations of record matching rely on exact as well as approximate string matching (e.g., edit distances) and use of external reference data sources. Record matching can be viewed as a query composed of a small set of primitive operators. However, formulating such record matching queries is difficult and depends on the specific application scenario. Specifically, the number of options both in terms of string matching operations as well as the choice of external sources can be daunting. In this paper, we exploit the availability of positive and negative examples to search through this space and suggest an initial record matching query. Such queries can be subsequently modified by the programmer as needed. We ensure that the record matching queries our approach produces are (1) efficient: these queries...
Surajit Chaudhuri, Bee-Chung Chen, Venkatesh Ganti
Added 05 Dec 2009
Updated 05 Dec 2009
Type Conference
Year 2007
Where VLDB
Authors Surajit Chaudhuri, Bee-Chung Chen, Venkatesh Ganti, Raghav Kaushik
Comments (0)