Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many machine learning problems [4, 16]. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers [1, 14]. As such, reducing those correlations while keeping the base classifiers’ performance levels high is a promising research topic. In this paper, we describe input decimation, a method that decouples the base classifiers by training them with different subsets of the input features. In past work [15], we showed the theoretical benefits of input decimation and presented its application to a handful of real data sets. In this paper, we provide a systematic study of input decimation on synthetic data sets and analyze how the interaction between correlation and performance in base classifiers affects ensemble performance.
Nikunj C. Oza, Kagan Tumer