This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if w...
: This paper addresses the sparse data problem in the linear regression model, namely the number of variables is significantly larger than the number of the data points for regress...
Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms l...
: We present a new iterative method for probabilistic clustering of data. Given clusters, their centers and the distances of data points from these centers, the probability of clus...
A classic problem in geometric modelling is curve interpolation to data points. Some of the existing interpolation schemes only require point data, whereas others, require higher ...
In this paper we propose a novel clustering algorithm based on maximizing the mutual information between data points and clusters. Unlike previous methods, we neither assume the d...
Large-scale text datasets have long eluded a family of particularly elegant and effective clustering methods that exploits the power of pair-wise similarities between data points ...
We present a novel method for clustering using the support vector machine approach. Data points are mapped to a high dimensional feature space, where support vectors are used to d...
Asa Ben-Hur, David Horn, Hava T. Siegelmann, Vladi...
Recently web-based educational systems collect vast amounts of data on user patterns, and data mining methods can be applied to these databases to discover interesting associations...
Behrouz Minaei-Bidgoli, Gerd Kortemeyer, William F...
Learning application-specific distance metrics from labeled data is critical for both statistical classification and information retrieval. Most of the earlier work in this area h...