Sciweavers

WISE
2005
Springer

Identifying Value Mappings for Data Integration: An Unsupervised Approach

14 years 6 months ago
Identifying Value Mappings for Data Integration: An Unsupervised Approach
The Web is a distributed network of information sources where the individual sources are autonomously created and maintained. Consequently, syntactic and semantic heterogeneity of data among sources abound. Most of the current data cleaning solutions assume that the data values referencing the same object bear some textual similarity. However, this assumption is often violated in practice. “Two-door front wheel drive” can be represented as “2DR-FWD” or “R2FD”, or even as “CAR TYPE 3” in different data sources. To address this problem, we propose a novel two-step automated technique that exploits statistical dependency structures among objects which is invariant to the tokens representing the objects. The algorithm achieved a high accuracy in our empirical study, suggesting that it can be a useful addition to the existing information integration techniques.
Jaewoo Kang, Dongwon Lee, Prasenjit Mitra
Added 25 Jun 2010
Updated 25 Jun 2010
Type Conference
Year 2005
Where WISE
Authors Jaewoo Kang, Dongwon Lee, Prasenjit Mitra
Comments (0)