Learning from noisy data is a challenging and reality issue for real-world data mining applications. Common practices include data cleansing, error detection and classifier ensemb...
Yan Zhang, Xingquan Zhu, Xindong Wu, Jeffrey P. Bo...
We propose and test an objective criterion for evaluation of clustering performance: How well does a clustering algorithm run on unlabeled data aid a classification algorithm? The...
—Eigenvalue analysis is an important aspect in many data modeling methods. Unfortunately, the eigenvalues of the sample covariance matrix (sample eigenvalues) are biased estimate...
Anne Hendrikse, Luuk J. Spreeuwers, Raymond N. J. ...
Image spam is a new trend in the family of email spams. The new image spams employ a variety of image processing technologies to create random noises. In this paper, we propose a s...
: Sufficiently high data quality is crucial for almost every application. Nonetheless, data quality issues are nearly omnipresent. The reasons for poor quality cannot simply be bla...