Many bioinformatics applications would benefit from comparing proteins based on their biological role rather than their sequence. In most biological databases, proteins are already annotated with ontology terms. Previous studies identified a correlation between the sequence similarity and the semantic similarity of proteins. The semantic similarity of proteins was computed from their annotated GO terms. However, proteins sharing a biological role do not necessarily have a similar sequence. This paper introduces our study of the correlation between GO and family similarity. Family similarity overcomes some of the limitations of sequence similarity, thus we obtained a strong correlation between GO and family similarity. Additionally, this paper introduces GraSM, a novel method that uses all the information in the graph structure of the GO, instead of considering it as a hierarchical tree. When calculating the semantic similarity of two concepts, GraSM selects the disjunctive common an...
Francisco M. Couto, Mário J. Silva, Pedro C