A Bayesian procedure for the simultaneous alignment and classification of sequences into subclasses is described. This Gibbs sampling algorithm iterates between an alignment step and a classification step. It employs Bayesian inference for the identification of the number of conserved columns, the number of motifs in each class, their size, and the size of the classes. Using Bayesian prediction, inter-class differences in all these variables are brought to bare on the classification. Application to a superfamily of cyclic nucleotide-binding proteins identifies both similarities and differences in the sequence characteristics of the five subclasses identified by the procedure: 1) cNMP-dependent kinases, 2) prokaryotic cAMP-dependent regulatory proteins, CRPtype, 3) prokaryotic regulatory proteins, FNR-type, 4) cAMP gated ion channel proteins of animals, and 5) cAMP gated ion channels of plants.
Kunbin Qu, Lee Ann McCue, Charles E. Lawrence