Random data perturbation (RDP) has been in use for several years in statistical databases and public surveys as a means of providing privacy to individuals while collecting informa...
Abstract. In many data mining projects the data to be analysed contains personal information, like names and addresses. Cleaning and preprocessing of such data likely involves dedu...
We introduce the Hierarchically Growing Hyperbolic Self-Organizing Map (H2 SOM) featuring two extensions of the HSOM (hyperbolic SOM): (i) a hierarchically growing variant that al...
—Data compression techniques such as null suppression and dictionary compression are commonly used in today’s database systems. In order to effectively leverage compression, it...
Stratos Idreos, Raghav Kaushik, Vivek R. Narasayya...
Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper we present a scalable sampling imple...