Sciweavers

SSDBM
2005
IEEE

An Information Theoretic Model for Database Alignment

14 years 5 months ago
An Information Theoretic Model for Database Alignment
As with many large organizations, the Government's data is split in many different ways and is collected at different times by different people. The resulting massive data heterogeneity means government staff cannot effectively locate, share, or compare data across sources, let alone achieve computational data interoperability. A case in point is the California Air Resources Board, a component of California EPA, which every year has to integrate emissions inventories from the 35 local air quality districts in California and send them to US EPA in North Carolina (which in turn has to integrate the data from all 50 states and from neighboring countries). The premise of our research is that it is possible to significantly reduce the amount of manual labor required in database wrapping and integration by automatically learning mappings in the data. In this research, we applied statistical algorithms to discover correspondences across comparable datasets at all levels. We have seen pa...
Patrick Pantel, Andrew Philpot, Eduard H. Hovy
Added 25 Jun 2010
Updated 25 Jun 2010
Type Conference
Year 2005
Where SSDBM
Authors Patrick Pantel, Andrew Philpot, Eduard H. Hovy
Comments (0)