Background: The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function – the basis of transfer of annotations in databases – must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins – for instance, orthologs in different mammals – to very distantly-related proteins at the limit of reliable recognition of homology. Results: We correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins h...
Vineet Sangar, Daniel J. Blankenberg, Naomi Altman