Industry-scale duplicate detection

15 years 6 months ago

Download www.hpi.uni-potsdam.de

Duplicate detection is the process of identifying multiple representations of a same real-world object in a data source. Duplicate detection is a problem of critical importance in many applications, including customer relationship management, personal information management, or data mining. In this paper, we present how a research prototype, namely DogmatiX, which was designed to detect duplicates in hierarchical XML data, was successfully extended and applied on a large scale industrial relational database in cooperation with Schufa Holding AG. Schufa's main business line is to store and retrieve credit histories of over 60 million individuals. Here, correctly identifying duplicates is critical both for individuals and companies: On the one hand, an incorrectly identified duplicate potentially results in a false negative credit history for an individual, who will then not be granted credit anymore. On the other hand, it is essential for companies that Schufa detects duplicates o...

Melanie Weis, Felix Naumann, Ulrich Jehle, Jens Lu

Real-time Traffic

Credit History | Hierarchical Xml Data | Industrial Relational Database | PVLDB 2008 |

claim paper

» Partial duplicate detection for large book collections

» Evaluating detection of near duplicate video segments

» Detection of Duplication in Documents and WebPages Based Documents Syntactical Structures ...

» Robust Duplicate Detection of 2D and 3D Objects

» Archeology of Code Duplication Recovering Duplication Chains from Small Duplication Fragme...

» Visual Detection of Duplicated Code

» DogmatiX Tracks down Duplicates in XML

» Matching Algorithms within a Duplicate Detection System

» Program analysis for code duplication in logic programs

Post Info
More Details (n/a)

Added	28 Dec 2010
Updated	28 Dec 2010
Type	Journal
Year	2008
Where	PVLDB
Authors	Melanie Weis, Felix Naumann, Ulrich Jehle, Jens Lufter, Holger Schuster

Comments (0)

Sciweavers

Industry-scale duplicate detection

Credit History | Hierarchical Xml Data | Industrial Relational Database | PVLDB 2008 |

Explore & Download

Productivity Tools

Sciweavers