Sciweavers

ISMB
1993

Computationally Efficient Cluster Representation in Molecular Sequence Megaclassification

14 years 23 days ago
Computationally Efficient Cluster Representation in Molecular Sequence Megaclassification
Molecular sequence megaclassification is a technique for automated protein sequence analysis and annotation. Implementation of the method has been limited by the need to store and randomly access a database of all the sequence pair similarities. More than 80,000 protein sequences are now present in the public databases, and the pair similarity data table for the full protein sequence database requires over 1 gigabyte of storage. In this paper we present a computationally efficient representation of groups based on a graph theory approach where sequence clusters are described by a minimal spanning tree of highest scoring similarity pairs. This representation allows a classification of N proteins to be stored in order(N) memory. The use of this minimal spanning tree representation simplifies analysis of groups, the description of group characteristics and the manual correction of artifacts resulting from false hits. The new tree representation also introduces new possibilities for artif...
David J. States, Nomi L. Harris, Lawrence Hunter
Added 02 Nov 2010
Updated 02 Nov 2010
Type Conference
Year 1993
Where ISMB
Authors David J. States, Nomi L. Harris, Lawrence Hunter
Comments (0)