Sciweavers

CORR
2010
Springer

Parallel Sorted Neighborhood Blocking with MapReduce

13 years 11 months ago
Parallel Sorted Neighborhood Blocking with MapReduce
: Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel entity resolution. In particular, we propose and evaluate two MapReduce-based implementations for Sorted Neighborhood blocking that either use multiple MapReduce jobs or apply a tailored data replication.
Lars Kolb, Andreas Thor, Erhard Rahm
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2010
Where CORR
Authors Lars Kolb, Andreas Thor, Erhard Rahm
Comments (0)