Sciweavers

CIKM
2007
Springer

Parallel linkage

14 years 6 months ago
Parallel linkage
We study the parallelization of the (record) linkage problem – i.e., to identify matching records between two collections of records, A and B. One of main idiosyncrasies of the linkage problem, compared to Database join, is the fact that once two records a in A and b in B are matched and merged to c, c needs to be compared to the rest of records in A and B again since it may incur new matching. This re-feeding stage of the linkage problem requires its solution to be iterative, and complicates the problem significantly. Toward this problem, we first discuss three plausible scenarios of inputs – when both collections are clean, only one is clean, and both are dirty. Then, we show that the intricate interplay between match and merge can exploit the characteristics of each scenario to achieve good parallelization. Our parallel algorithms achieve 6.55–7.49 times faster in speedup compared to sequential ones with 8 processors,
Hung-sik Kim, Dongwon Lee
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where CIKM
Authors Hung-sik Kim, Dongwon Lee
Comments (0)