Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data

14 years 3 months ago

Download disi.unitn.it

A prerequisite for leveraging the vast amount of data available on the Web is Entity Resolution, i.e., the process of identifying and linking data that describe the same real-world objects. To make this inherently quadratic process applicable to large data sets, blocking is typically employed: entities (records) are grouped into clusters - the blocks - of matching candidates and only entities of the same block are compared. However, novel blocking techniques are required for dealing with the noisy, heterogeneous, semi-structured, user-generated data in the Web, as traditional blocking techniques are inapplicable due to their reliance on schema information. The

George Papadakis, Ekaterini Ioannou, Claudia Niede

Real-time Traffic

Clusters | Data Mining | Linking Data | Prerequisite | WSDM 2012 |

claim paper

Post Info
More Details (n/a)

Added	25 Apr 2012
Updated	25 Apr 2012
Type	Journal
Year	2012
Where	WSDM
Authors	George Papadakis, Ekaterini Ioannou, Claudia Niederée, Themis Palpanas, Wolfgang Nejdl

Comments (0)

Sciweavers

Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data

Clusters | Data Mining | Linking Data | Prerequisite | WSDM 2012 |

Explore & Download

Productivity Tools

Sciweavers