Abstract. This paper proposes a novel algorithm for complete exact patternmatching focusing the specificities of protein sequences (alphabet of 20 symbols) but, also highly efficient considering larger alphabets. The searching strategy uses large search windows allowing multiple alignments per iteration. A new filtering heuristic, named compatibility rule, contributed decisively to the efficiency improvement. The new algorithm’s performance is, on average, superior in comparison with its best-rated competitors.
Sérgio A. D. Deusdado, Paulo M. M. Carvalho