Sciweavers

CSB
2004
IEEE

Compressed Pattern Matching in DNA Sequences

14 years 3 months ago
Compressed Pattern Matching in DNA Sequences
We propose derivative Boyer-Moore (d-BM), a new compressed pattern matching algorithm in DNA sequences. This algorithm is based on the BoyerMoore method, which is one of the most popular string matching algorithms. In this approach, we compress both DNA sequences and patterns by using two bits to represent each A, T, C, G character. Experiments indicate that this compressed pattern matching algorithm searches long DNA patterns (length > 50) more than 10 times faster than the exact match routine of the software package Agrep, which is known as the fastest pattern matching tool. Moreover, compression of DNA sequences by this method gives a guaranteed space saving of 75%. In part the enhanced speed of the algorithm is due to the increased efficiency of the Boyer-Moore method resulting from an increase in alphabet size from 4 to 256.
Lei Chen, Shiyong Lu, Jeffrey L. Ram
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2004
Where CSB
Authors Lei Chen, Shiyong Lu, Jeffrey L. Ram
Comments (0)