Sciweavers

BMCBI
2010

A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences

13 years 9 months ago
A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences
Background: We propose a sequence clustering algorithm and compare the partition quality and execution time of the proposed algorithm with those of a popular existing algorithm. The proposed clustering algorithm uses a grammar-based distance metric to determine partitioning for a set of biological sequences. The algorithm performs clustering in which new sequences are compared with cluster-representative sequences to determine membership. If comparison fails to identify a suitable cluster, a new cluster is created. Results: The performance of the proposed algorithm is validated via comparison to the popular DNA/RNA sequence clustering approach, CD-HIT-EST, and to the recently developed algorithm, UCLUST, using two different sets of 16S rDNA sequences from 2,255 genera. The proposed algorithm maintains a comparable CPU execution time with that of CD-HIT-EST which is much slower than UCLUST, and has successfully generated clusters with higher statistical accuracy than both CD-HIT-EST an...
David J. Russell, Samuel F. Way, Andrew K. Benson,
Added 28 Feb 2011
Updated 28 Feb 2011
Type Journal
Year 2010
Where BMCBI
Authors David J. Russell, Samuel F. Way, Andrew K. Benson, Khalid Sayood
Comments (0)