The use of multiple features by a classifier often leads to a reduced probability of error, but the design of an optimal Bayesian classifier for multiple features is dependent on the estimation of multidimensional joint probability density functions and therefore requires a design sample size that, in general, increases exponentially with the number of dimensions. The classification method described in this paper makes decisions by combining the decisions made by multiple Bayesian classifiers using an additional classifier that estimates the joint probability densities of the decision space rather than the joint probability densities of the feature space. A proof is presented for the restricted case of two classes and two features; showing that the method always demonstrates a probability of error that is less than or equal to the probability of error of the marginal classifier with the lowest probability of error.
Mark D. Happel, Peter Bock