We describe a Markov chain Bayesian classification tool, SCS, that can perform data-driven classification of proteins and protein segments. Training data for interesting classification problems is often limited; thus, SCS uses string transformation functions to change the encoding of proteins to reduce problem perplexity and improve classification. A wrapperbased genetic algorithm is used to search the space of possible string transformation functions to find functions that improve classification. Categories and Subject Descriptors I.5.1 [Computing Methodologies]: Pattern Recognition Models Statistical Keywords Bioinformatics, Classifier Systems General Terms Algorithms
Timothy Meekhof, Gary W. Daughdrill, Robert B. Hec