Signal nding pattern discovery in unaligned DNA sequences is a fundamental problem in both computer science and molecular biology with important applications in locating regulatory sites and drug target identication. Despite many studies, this problem is far from being resolved: most signals in DNA sequences are so complicated that we don't yet have good models or reliable algorithms for their recognition. We complement existing statistical and machine learning approaches to this problem by a combinatorial approach that proved to be successful in identifying very subtle signals.
Pavel A. Pevzner, Sing-Hoi Sze