Matching records that refer to the same entity across databases is becoming an increasingly important part of many data mining projects, as often data from multiple sources needs ...
Online email archives are an under-protected yet extremely sensitive information resource. Email archives can store years worth of personal and business email in an easy-to-access...
The Fellegi-Holt method automatically “corrects” data that fail some predefined requirements. Computer implementations of the method were used in many national statistics bure...
Real-world databases often contain syntactic and semantic errors, in spite of integrity constraints and other safety measures incorporated into modern DBMSs. We present ERACER, an...