Sciweavers

EDBT
2016
ACM

Scaling Entity Resolution to Large, Heterogeneous Data with Enhanced Meta-blocking

8 years 7 months ago
Scaling Entity Resolution to Large, Heterogeneous Data with Enhanced Meta-blocking
Entity Resolution constitutes a quadratic task that typically scales to large entity collections through blocking. The resulting blocks can be restructured by Meta-blocking in order to significantly increase precision at a limited cost in recall. Yet, its processing can be time-consuming, while its precision remains poor for configurations with high recall. In this work, we propose new meta-blocking methods that improve precision by up to an order of magnitude at a negligible cost to recall. We also introduce two efficiency techniques that, when combined, reduce the overhead time of Metablocking by more than an order of magnitude. We evaluate our approaches through an extensive experimental study over 6 realworld, heterogeneous datasets. The outcomes indicate that our new algorithms outperform all meta-blocking techniques as well as the state-of-the-art methods for block processing in all respects.
Added 02 Apr 2016
Updated 02 Apr 2016
Type Journal
Year 2016
Where EDBT
Comments (0)