This paper conducts experiments with three skewed data sets, seeking to demonstrate problems when skewed data is used, and identifying counter problems when data is balanced. The b...
Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security co...
Published data is prone to privacy attacks. Sanitization methods aim to prevent these attacks while maintaining usefulness of the data for legitimate users. Quantifying the trade-...
A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, the knowledge embedded in a data stream is more likely to be c...
As the result of interactions between visitors and a web site, an http log file contains very rich knowledge about users on-site behaviors, which, if fully exploited, can better c...