Learning from noisy data is a challenging and reality issue for real-world data mining applications. Common practices include data cleansing, error detection and classifier ensembling. The essential goal is to reduce noise impacts and enhance the learners built from the noise corrupted data, so as to benefit further data mining procedures. In this paper, we present a novel framework that unifies error detection, correction and data cleansing to build an aggressive classifier ensemble for effective learning from noisy data. Being aggressive, the classifier ensemble is built from the data that has been preprocessed by the data cleansing and correcting techniques. Experimental comparisons will demonstrate that such an aggressive classifier ensemble is superior to the model built from the original noisy data, and is more reliable in enhancing the learning theory extracted from noisy data sources, in comparison with simple data correction or cleansing efforts.
Yan Zhang, Xingquan Zhu, Xindong Wu, Jeffrey P. Bo