Manyreal-world KDDexpeditions involve investigation of relationships betweenvariables in different, heterogeneousdatabases. Wepresent a dynamic programmingtechnique for linking records in multiple heterogeneousdatabases usinglooselydefinedfields that allowfree-style verbatim entries. Wedevelop an interestingness measurebased on non-parametric randomization tests, whichcan be used for miningpotentially useful relationships amongvariables. This measure usesdistributional characteristics of historical events, hence accommodatingvariable-length records in a natural way.Asan illustration, we include a successful application of the proposed methodologyto a real-world data miningproblem at LucentTechnologies.
José C. Pinheiro, Don X. Sun