Part of the process of data integration is determining which sets of identifiers refer to the same real-world entities. In integrating databases found on the Web or obtained by us...
The similarity join is an important operation for mining high-dimensional feature spaces. Given two data sets, the similarity join computes all tuples (x, y) that are within a dis...
In this paper, we propose a novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor. We rank each point on the basis o...
AURA (Advanced Uncertain Reasoning Architecture) is a parallel pattern matching technology intended for high-speed approximate search and match operations on large unstructured dat...