Background: Large biological data sets, such as expression profiles, benefit from reduction of random noise. Principal component (PC) analysis has been used for this purpose, but ...
The problem of assessing the significance of data mining results on high-dimensional 0?1 data sets has been studied extensively in the literature. For problems such as mining freq...
Aristides Gionis, Heikki Mannila, Panayiotis Tsapa...
Randomization has emerged as a useful technique for data disguising in privacy-preserving data mining. Its privacy properties have been studied in a number of papers. Kargupta et ...
We present a (non-standard) probabilistic analysis of dynamic data structures whose sizes are considered as dynamic random walks. The basic operations (insertion, deletion, positi...
We investigate how random projection can best be used for clustering high dimensional data. Random projection has been shown to have promising theoretical properties. In practice,...