Sciweavers

BMCBI
2008

Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

14 years 28 days ago
Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering
Background: The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. Results: We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection ca...
Shibu Yooseph, Weizhong Li, Granger G. Sutton
Added 08 Dec 2010
Updated 08 Dec 2010
Type Journal
Year 2008
Where BMCBI
Authors Shibu Yooseph, Weizhong Li, Granger G. Sutton
Comments (0)