A weighted sequence is a string in which a set of characters may appear at each position with respective probabilities of occurrence. A common task is to locate a given motif in a weighted sequence in exact, approximate or bounded gap form, with presence probability not less than a given threshold. The motif could be a normal non-weighted string or even a string with don’t care symbols. We give an algorithmic framework that is capable of tackling above motif discovery problems. Utilizing the notion of maximal factors, the framework provides an approach for reducing each problem to equivalent problem in non-weighted strings without any time degradation.
Hui Zhang, Qing Guo, Costas S. Iliopoulos