Sciweavers

DGO
2006

Matching and integration across heterogeneous data sources

14 years 1 months ago
Matching and integration across heterogeneous data sources
A sea of undifferentiated information is forming from the body of data that is collected by people and organizations, across government, for different purposes, at different times, and using different methodologies. The resulting massive data heterogeneity requires automatic methods for data alignment, matching and/or merging. In this poster, we describe two systems, GuspinTM and SiftTM, for automatically identifying equivalence classes and for aligning data across databases. Our technology, based on principles of information theory, measures the relative importance of data, leveraging them to quantify the similarity between entities. These systems have been applied to solve real problems faced by the Environmental Protection Agency and its counterparts at the state and local government level. Categories and Subject Descriptors H.2.5 [Database Management]: Heterogeneous Databases. General Terms Algorithms, Experimentation. Keywords Information theory, mutual information, database alig...
Patrick Pantel, Andrew Philpot, Eduard H. Hovy
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where DGO
Authors Patrick Pantel, Andrew Philpot, Eduard H. Hovy
Comments (0)