Sciweavers

WSDM
2012
ACM

Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data

12 years 8 months ago
Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data
A prerequisite for leveraging the vast amount of data available on the Web is Entity Resolution, i.e., the process of identifying and linking data that describe the same real-world objects. To make this inherently quadratic process applicable to large data sets, blocking is typically employed: entities (records) are grouped into clusters - the blocks - of matching candidates and only entities of the same block are compared. However, novel blocking techniques are required for dealing with the noisy, heterogeneous, semi-structured, user-generated data in the Web, as traditional blocking techniques are inapplicable due to their reliance on schema information. The
George Papadakis, Ekaterini Ioannou, Claudia Niede
Added 25 Apr 2012
Updated 25 Apr 2012
Type Journal
Year 2012
Where WSDM
Authors George Papadakis, Ekaterini Ioannou, Claudia Niederée, Themis Palpanas, Wolfgang Nejdl
Comments (0)