Due to the curse of dimensionality, high-dimensional data is often pre-processed with some form of dimensionality reduction for the classification task. Many common methods of supervised dimensionality reduction have focused on separating and collapsing the data near the class centroids. These methods often make assumptions on the distributions of the data classes – namely Gaussianity – which can lead to ad-hoc and sub-optimal implementation. In this paper we present a method of supervised dimensionality reduction which takes an information-geometric approach by maximizing the between class information distances. This is shown to have direct relation to the Chernoff and Bhattacharya performance bounds for classification error. We illustrate our methods on real data and compare to several existing methods.
Kevin M. Carter, Raviv Raich, Alfred O. Hero