Sciweavers

CIKM
2011
Springer

Scalable entity matching computation with materialization

12 years 11 months ago
Scalable entity matching computation with materialization
Entity matching (EM) is the task of identifying records that refer to the same real-world entity from different data sources. While EM is widely used in data integration and data cleaning applications, the naive method for EM incurs quadratic cost with respect to the size of the datasets. To address this problem, this paper proposes a scalable EM algorithm that employs a pre-materialized structure. Specifically, once the structure is built, our proposed algorithm can identify the EM results with sub-linear cost. In addition, as the rules evolve, our algorithm can efficiently adapt to new rules by selectively accessing records using the materialized structure. Our evaluation results show that our proposed EM algorithm is significantly faster than the state-of-the-art method for extensive real-life datasets. Categories and Subject Descriptors H.2.m [Database Management]: Miscellaneous.data cleaning; H.2.8 [Database Management]: Database Applications.data mining General Terms Algorith...
Sanghoon Lee, Jongwuk Lee, Seung-won Hwang
Added 13 Dec 2011
Updated 13 Dec 2011
Type Journal
Year 2011
Where CIKM
Authors Sanghoon Lee, Jongwuk Lee, Seung-won Hwang
Comments (0)