Multiple Sequence Alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, several heuristics were suggested for the problem. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution. We solve the following version of the MSA problem: In a preprocessing stage pairwise alignments are found for every pair of sequences. The goal is to find an optimal alignment in which matches are restricted to positions that were matched at the preprocessing stage. We prove that it suffices to find an optimal alignment of sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time. We also identify “shortcuts” that expedite the dynamic programming scheme. Under s...
Pankaj K. Agarwal, Yonatan Bilu, Rachel Kolodny