Multiple sequence alignment is a central problem in Bioinformatics. A known integer programming approach is to apply branch-and-cut to exponentially large graph-theoretic models. This paper describes a new integer program formulation that generates models small enough to be passed to generic solvers. The formulation is a hybrid relating the sparse alignment graph with a compact encoding of the alignment matrix via channelling constraints. Alignments obtained with a SAT-based local search algorithm are competitive with those of state-of-the-art algorithms, though execution times are much longer. 1 Background Multiple sequence alignment (MSA) is a central problem in Bioinformatics and is known to be NP-complete [3]. Given a number of sequences of symbols from an alphabet, the aim is to align them while maximizing some function. Gaps may be introduced between symbols, and in some MSA formulations the objective function includes a measure of the number and length of gaps. A common data str...
Steven David Prestwich, Desmond G. Higgins, Orla O