The concepts of similarity and distance are crucial in data mining. We consider the problem of defining the distance between two data sets by comparing summary statistics compute...
The skyline operator is a well established database primitive which is traditionally applied in a way that only a single skyline is computed. In this paper we use multiple skylines...
Experimental methodology for evaluating classification algorithms in relational (i.e., networked) data is complicated by dependencies between related data instances. We survey the...
The duplicate elimination problem of detecting multiple tuples, which describe the same real world entity, is an important data cleaning problem. Previous domain independent solut...
We examine the analysis of hyperspectral data produced by the Hyperspectral Core Imager of AngloGold Ashanti. The dimension of the data is reduced using diffusion maps and the dat...