Background: Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For noncoding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate noncoding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pair...
David A. Nix, Michael B. Eisen