Background: Existing hidden Markov model decoding algorithms do not focus on approximately identifying the sequence feature boundaries. Results: We give a set of algorithms to compute the conditional probability of all labellings "near" a reference labelling l for a sequence y for a variety of definitions of "near". In addition, we give optimization algorithms to find the best labelling for a sequence in the robust sense of having all of its feature boundaries nearly correct. Natural problems in this domain are NP-hard to optimize. For membrane proteins, our algorithms find the approximate topology of such proteins with comparable success to existing programs, while being substantially more accurate in estimating the positions of transmembrane helix boundaries. Conclusion: More robust HMM decoding may allow for better analysis of sequence features, in reasonable runtimes. Background Decoding hidden Markov models (HMMs) continues to be a central problem in bioinform...
Daniel G. Brown 0001, Jakub Truszkowski