Motivation: Next-generation sequencing methods are generating increasingly massive datasets, yet still do not fully capture genetic diversity in the richest environments. To understand such complicated and elusive systems, effective tools are needed to assist with delineating the differences found in and between community datasets. Results: The Small Subunit Markov Modeler (SSuMMo) was developed to probabilistically assign SSU rRNA gene fragments from any sequence dataset to recognized taxonomic clades, producing consistent, comparable cladograms. Accuracy tests predicted >90% of genera correctly for sequences downloaded from public reference databases. Sequences from a next-generation sequence dataset, sampled from lean, overweight and obese individuals, were analysed to demonstrate parallel visualization of comparable datasets. SSuMMo shows potential as a valuable curatorial tool, as numerous incorrect and outdated taxonomic entries and annotations were identified in public data...
Alex L. B. Leach, James P. J. Chong, Kelly R. Rede