We continue the study of approximating the number of distinct elements in a data stream of length n to within a (1? ) factor. It is known that if the stream may consist of arbitra...
Data mining tasks such as supervised classification can often benefit from a large training dataset. However, in many application domains, privacy concerns can hinder the construc...
Motivated by numerous applications in which the data may be modeled by a variable subscripted by three or more indices, we develop a tensor-based extension of the matrix CUR decom...
Michael W. Mahoney, Mauro Maggioni, Petros Drineas
Calculation of object similarity, for example through a distance function, is a common part of data mining and machine learning algorithms. This calculation is crucial for efficie...
We consider the wavelet synopsis construction problem for data streams where given n numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much sma...