Many data mining applications involve the task of building a model for predictive classification. The goal of such a model is to classify examples (records or data instances) into classes or categories of the same type. The use of variables (attributes) not related to the classes can reduce the accuracy and reliability of a classification or prediction model. Superfluous variables can also increase the costs of building a model - particularly on large data sets. We propose a discrete Particle Swarm Optimization (PSO) algorithm designed for attribute selection. The proposed algorithm deals with discrete variables, and its population of candidate solutions contains particles of different sizes. The performance of this algorithm is compared with the performance of a standard binary PSO algorithm on the task of selecting attributes in a bioinformatics data set. The criteria used for comparison are: (1) maximizing predictive accuracy; and (2) finding the smallest subset of attributes. Cate...
Elon S. Correa, Alex Alves Freitas, Colin G. Johns