Annotation of protein function often arises in the context of partially complete genomes but is not adequately addressed. We present an annotation method by extracting ortholog clusters from incomplete genomes that are evolutionary closely related to the genome of interest. To construct clusters, our method focuses on sequence similarities across genomes rather than similarities between sequences within a genome. We use the quasi-concave set function optimization for extracting the ortholog clusters as extreme groups of sequences such that similarity of the least similar sequence in this group is maximum. A protein sequence is annotated with the ortholog cluster whose average similarity is highest. We have applied this method for annotating the Rice proteome based on clusters constructed on four partially complete cereal proteomes and the complete proteome from Arabidopsis.
Akshay Vashist, Casimir A. Kulikowski, Ilya B. Muc