: We present a practical approach to nonparametric cluster analysis of large data sets. The number of clusters and the cluster centres are automatically derived by mode seeking wit...
Star schema has been a typical model for both online transaction processing in traditional databases and online analytical processing in large data warehouses. In the star schema,...
We propose to use AdaBoost to efficiently learn classifiers over very large and possibly distributed data sets that cannot fit into main memory, as well as on-line learning wher...
Data-warehousing applications cope with enormous data sets in the range of Gigabytes and Terabytes. Queries usually either select a very small set of this data or perform aggregat...
We present a new algorithm for material boundary interface reconstruction from data sets containing volume fractions. We transform the reconstruction problem to a problem that ana...
Kathleen S. Bonnell, Kenneth I. Joy, Bernd Hamann,...
In this paper we introduce a novel way of modeling distributions with a low latent dimensionality. Our method allows for a strict control of the properties of the mapping between ...
Abstract. Many data mining approaches focus on the discovery of similar (and frequent) data values in large data sets. We present an alternative, but complementary approach in whic...
Jeff Edmonds, Jarek Gryz, Dongming Liang, Ren&eacu...
The goal of this paper is to further investigate the extreme behaviour of the proportional membership model (FCPM) in contrast to the central tendency of fuzzy c-means (FCM). A dat...
Susana Nascimento, Boris Mirkin, Fernando Moura-Pi...
Labeled data for classification could often be obtained by sampling that restricts or favors choice of certain classes. A classifier trained using such data will be biased, resulti...
We consider geometric conditions on a labeled data set which guarantee that boosting algorithms work well when linear classifiers are used as weak learners. We start by providing ...