A hybrid clustering approach to recognition of protein families in 114 microbial genomes

15 years 6 months ago

Download www.biomedcentral.com

Background: Grouping proteins into sequence-based clusters is a fundamental step in many bioinformatic analyses (e.g., homology-based prediction of structure or function). Standard clustering methods such as single-linkage clustering capture a history of cluster topologies as a function of threshold, but in practice their usefulness is limited because unrelated sequences join clusters before biologically meaningful families are fully constituted, e.g. as the result of matches to so-called promiscuous domains. Use of the Markov Cluster algorithm avoids this non-specificity, but does not preserve topological or threshold information about protein families. Results: We describe a hybrid approach to sequence-based clustering of proteins that combines the advantages of standard and Markov clustering. We have implemented this hybrid approach over a relational database environment, and describe its application to clustering a large subset of PDB, and to 328577 proteins from 114 fully sequenc...

Timothy J. Harlow, J. Peter Gogarten, Mark A. Raga

Real-time Traffic

BMCBI 2004 | Hybrid Clustering | Markov Cluster Algorithm | Standard Clustering Methods |

claim paper

Added	16 Dec 2010
Updated	16 Dec 2010
Type	Journal
Year	2004
Where	BMCBI
Authors	Timothy J. Harlow, J. Peter Gogarten, Mark A. Ragan

Sciweavers

A hybrid clustering approach to recognition of protein families in 114 microbial genomes

BMCBI 2004 | Hybrid Clustering | Markov Cluster Algorithm | Standard Clustering Methods |

Explore & Download

Productivity Tools

Sciweavers