Optimized spaced seeds improve sensitivity and specificity in local homology search. Several authors have shown that multiple seeds can have better sensitivity and specificity than single seeds. We describe a linear programming-based algorithm to optimize a set of seeds. Theoretically, our algorithm offers a performance guarantee: the sensitivity of a chosen seed set is at least 70% of what can be achieved, in most reasonable models of homologous sequences. In practice, our algorithm generates a solution which is at least 90% of the optimal. Our method not only achieves performance better than or comparable to that of a greedy algorithm, but also gives this area a mathematical foundation.
Jinbo Xu, Daniel G. Brown, Ming Li, Bin Ma