Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

14 years 25 days ago

Download www.biomedcentral.com

Background: The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. Results: We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection ca...

Shibu Yooseph, Weizhong Li, Granger G. Sutton

Real-time Traffic

BMCBI 2008 | Incremental Clustering Method | Metagenomic Dataset | Protein |

claim paper

Post Info
More Details (n/a)

Added	08 Dec 2010
Updated	08 Dec 2010
Type	Journal
Year	2008
Where	BMCBI
Authors	Shibu Yooseph, Weizhong Li, Granger G. Sutton

Comments (0)

Sciweavers

Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

BMCBI 2008 | Incremental Clustering Method | Metagenomic Dataset | Protein |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers