A genetic algorithm for the longest common subsequence problem encodes candidate sequences as binary strings that indicate subsequences of the shortest or first string. Its fitness function penalizes sequences not found in all the strings. In tests on 84 sets of three strings, a dynamic programming algorithm returns optimum solutions quickly on smaller instances and increasingly slowly on larger instances. Repeated trials of the GA always identify optimum subsequences, and it runs in reasonable times even on the largest instances. Categories and Subject Descriptors G.2.1 [Mathematics of Computing]: Discrete Mathematics--Combinatorics; I.2.8 [Problem Solving, Control Methods, and Search]: Heuristic Methods General Terms Algorithms Keywords Strings, longest common subsequence, genetic algorithm
Brenda Hinkemeyer, Bryant A. Julstrom