The number of features that can be computed over an image is, for practical purposes, limitless. Unfortunately, the number of features that can be computed and exploited by most computer vision systems is considerably less. As a result, it is important to develop techniques for selecting features from very large data sets that include many irrelevant or redundant features. This work addresses the feature selection problem by proposing a three-step algorithm. The first step uses a variation of the well known Relief algorithm [11] to remove irrelevance; the second step clusters features using K-means to remove redundancy; and the third step is a standardcombinatorialfeature selection algorithm. This three-step combination is shown to be more effective than standard feature selection algorithms for large data sets with lots of irrelevant and redundant features. It is also shown to be no worse than standard techniques for data sets that do not have these properties. Finally, we show a thi...
José Bins, Bruce A. Draper