An Information Theoretic Model for Database Alignment

14 years 5 months ago

Download www.isi.edu

As with many large organizations, the Government's data is split in many different ways and is collected at different times by different people. The resulting massive data heterogeneity means government staff cannot effectively locate, share, or compare data across sources, let alone achieve computational data interoperability. A case in point is the California Air Resources Board, a component of California EPA, which every year has to integrate emissions inventories from the 35 local air quality districts in California and send them to US EPA in North Carolina (which in turn has to integrate the data from all 50 states and from neighboring countries). The premise of our research is that it is possible to significantly reduce the amount of manual labor required in database wrapping and integration by automatically learning mappings in the data. In this research, we applied statistical algorithms to discover correspondences across comparable datasets at all levels. We have seen pa...

Patrick Pantel, Andrew Philpot, Eduard H. Hovy

Real-time Traffic

California Air Resources Board | Database | Information Theoretic Model | Massive Data Heterogeneity | SSDBM 2005 |

claim paper

Post Info
More Details (n/a)

Added	25 Jun 2010
Updated	25 Jun 2010
Type	Conference
Year	2005
Where	SSDBM
Authors	Patrick Pantel, Andrew Philpot, Eduard H. Hovy

Comments (0)

Sciweavers

An Information Theoretic Model for Database Alignment

California Air Resources Board | Database | Information Theoretic Model | Massive Data Heterogeneity | SSDBM 2005 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers