Determining attribute correspondences is a difficult, time-consuming, knowledge-intensive part of database integration. We report on experiences with tools that identified candidate correspondences, as a step in a large scale effort to improve communication among Air Force systems. First, we describe a new method that was both simple and surprisingly successful: Data dictionary and catalog information were dumped to unformatted text; then off-the-shelf information retrieval software estimated string similarity, generated candidate matches, and provided the interface. The second method used a different set of clues, such as statistics on database populations, to compute separate similarity metrics (using neural network techniques). We report on substantial use of the first tool, and then report some limited initial experiments that examine the two techniques’ accuracy, consistency and complementarity. Keywords correspondence investigation, data mining
Chris Clifton, E. Housman, Arnon Rosenthal