Abstract. Feature extraction based on evolutionary search offers new possibilities for improving classification accuracy and reducing measurement complexity in many data mining and machine learning applications. We present a family of genetic algorithms for feature synthesis through clustering of discrete attribute values. The approach uses new compact graph-based encoding for cluster representation, where size of GA search space is reduced exponentially with respect to the number of items in partitioning, as compared to original idea of Park and Song. We apply developed algorithms and study their effectiveness for DNA fingerprinting in population genetics and text categorization.
Alexander P. Topchy, William F. Punch