Wehave recently described a method based on Artificial Neural Networksto cluster protein sequences into families. The network was trained with Kohonen’s unsupervised-learning algorithm using, as inputs, matrix patterns derived from the bipeptide composition of the proteins. Weshow here the application of that method to classify 1758 protein sequences, using as inputs a limited numberof principal components of the bipeptidic matrices. As a result of training, the networkselforganized the activation of its neuronsinto a topologically ordered map, in which proteins belonging to a knownfamily (immunoglobulins, actins, interferons, myosins, HLAhistocompatibility antigens, hemoglobins, etc.) were usually associated with the same neuron or with neighboring ones. Once the topological maphas been obtained, the classification of newsequencesis very fast.
Edgardo A. Ferrán, Pascual Ferrara, Bernard