Applications demanding multidimensional index structures for performing efficient similarity queries often involve a large amount of data. The conventional tuple-loading approach t...
Hyun-Jeong Seok, Gang Qian, Qiang Zhu, Alexander R...
Spreadsheet tools are often used in business and private scenarios in order to collect and store data, and to explore and analyze these data by executing functions and aggregation...
Many applications today need to manage large data sets with uncertainties. In this paper we describe the foundations of managing data where the uncertainties are quantified as pro...
The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The s...
Phylogenetic hidden Markov models (phylo-HMMs) have recently been proposed as a means for addressing a multispecies version of the ab initio gene prediction problem. These models ...
We present a new technique for efficiently computing Degree-of-Interest distributions to inform the visualization of graph-structured data. The technique is independent of the int...
Support vector machines (SVMs) have been promising methods for classification and regression analysis because of their solid mathematical foundations which convey several salient ...
An important problem in data mining is detecting changes in large data sets. Although there are a variety of change detection algorithms that have been developed, in practice it c...
Chris Curry, Robert L. Grossman, David Locke, Stev...
We present a new method to estimate the intrinsic dimensionality of a submanifold M in Rd from random samples. The method is based on the convergence rates of a certain U-statisti...
In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of th...