Highly Scalable and Accurate Seeds for Subsequence Alignment

15 years 7 months ago

Download www.cise.ufl.edu

We propose a method for ﬁnding seeds for the local alignment of two nucleotide sequences. Our method uses randomized algorithms to ﬁnd approximate seeds. We present a dynamic index to store the ﬁngerprints of k-grams and a highly scalable and accurate (HSA) algorithm to incorporate randomization into process of seed generation. Experimental results show that our method produces better quality seeds with improved running time and memory usage compared to traditional non-spaced and spaced seeds. The presented algorithm scales very well with higher seed lengths while maintaining the quality and performance. 1 Motivation Locating similar subsequences between a query sequence and the sequences in a database is one of the most fundamental problems in bioinformatics. This is also known as the local alignment problem. Local alignment matches pairs of letters between two subsequences. A score is then assigned for each match. Every mismatch and gap are penalized with appropriate mismatch,...

Abhijit Pol, Tamer Kahveci

Real-time Traffic