Motivation: As next generation sequencing is rapidly adding new genomes, their correct placement in the taxonomy needs verification. However, the current methods for confirming classification of a taxon or suggesting revision for a potential misplacement relies on computationally intense multi-sequence alignment followed by an iterative adjustment of the distance matrix. Due to intra-heterogeneity issues with the 16S rRNA marker, no classifier is available for sub-genus level that could readily suggest a classification for a novel 16S rRNA sequence. Metagenomics further complicates the issue by generating fragmented 16S rRNA sequences. This paper proposes a novel alignment-free method for representing the microbial profiles using Extensible Markov Models (EMM) with an extended Karlin-Altschul statistical framework similar to the classic alignment paradigm. We propose a Log Odds (LOD) score classifier based on Gumbel difference distribution that confirms correct classifications with st...
Rao M. Kotamarti, Michael Hahsler, Douglas Raiford