Sciweavers

SIGMOD
2009
ACM

Entity resolution with iterative blocking

14 years 11 months ago
Entity resolution with iterative blocking
Entity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for large datasets. Various blocking techniques can be used to enhance the performance of ER by dividing the records into blocks in multiple ways and only comparing records within the same block. However, most blocking techniques process blocks separately and do not exploit the results of other blocks. In this paper, we propose an iterative blocking framework where the ER results of blocks are reflected to subsequently processed blocks. Blocks are now iteratively processed until no block contains any more matching records. Compared to simple blocking, iterative blocking may achieve higher accuracy because reflecting the ER results of blocks to other blocks may generate additional record matches. Iterative blocking may also be more efficient because processi...
Steven Euijong Whang, David Menestrina, Georgia Ko
Added 05 Dec 2009
Updated 05 Dec 2009
Type Conference
Year 2009
Where SIGMOD
Authors Steven Euijong Whang, David Menestrina, Georgia Koutrika, Martin Theobald, Hector Garcia-Molina
Comments (0)