Sciweavers

RECOMB
2002
Springer

Provably sensitive Indexing strategies for biosequence similarity search

14 years 11 months ago
Provably sensitive Indexing strategies for biosequence similarity search
The field of algorithms for pairwisc biosequence similarity search is dominated by heuristic methods of high efficiency but uncertain sensitivity. One reason that more formal string matching algorithms with sensitivity guarantees have not been applied to biosequences is that they cannot directly find similarities that score highly under substitution score functions such as the DNAPAM-TT [20], PAM [9], or BLOSUM [12] families of matrices. We describe a general technique, score simulation, to map ungapped similarity search problems using these score functions into the problem of finding pairs of strings that are close in Hamming space. Score simulation leads to indexing schemes for biosequences that permit efficient ungapped similarity searches with formal guarantees of sensitivity using arbitrary score functions. In particular, we introduce the LSH-ALL-PAIRS-SIM algorithm for finding local similarities in large biosequence collections and show that it is both computationally feasible a...
Jeremy Buhler
Added 03 Dec 2009
Updated 03 Dec 2009
Type Conference
Year 2002
Where RECOMB
Authors Jeremy Buhler
Comments (0)