Sciweavers

BMCBI
2010

Clustering metagenomic sequences with interpolated Markov models

13 years 11 months ago
Clustering metagenomic sequences with interpolated Markov models
Background: Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. Results: We present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that perf...
David R. Kelley, Steven L. Salzberg
Added 08 Dec 2010
Updated 08 Dec 2010
Type Journal
Year 2010
Where BMCBI
Authors David R. Kelley, Steven L. Salzberg
Comments (0)