Sciweavers

DASFAA
2003
IEEE

Approximate String Matching in DNA Sequences

14 years 4 months ago
Approximate String Matching in DNA Sequences
Approximate string matching on large DNA sequences data is very important in bioinformatics. Some studies have shown that suffix tree is an efficient data structure for approximate string matching. It performs better than suffix array if the data structure can be stored entirely in the memory. However, our study find that suffix array is much better than suffix tree for indexing the DNA sequences since the data structure has to be created and stored on the disk due to its size. We propose a novel auxiliary data structure which greatly improves the efficiency of suffix array in the approximate string matching problem in the external memory model. The second problem we have tackled is the parallel approximate matching in DNA sequence. We propose 2 novel parallel algorithms for this problem and implement them on a PC cluster. The result shows that when the error allowed is small, a direct partitioning of the array over the machines in the cluster is a more efficient approach. On...
Lok-Lam Cheng, David Wai-Lok Cheung, Siu-Ming Yiu
Added 04 Jul 2010
Updated 04 Jul 2010
Type Conference
Year 2003
Where DASFAA
Authors Lok-Lam Cheng, David Wai-Lok Cheung, Siu-Ming Yiu
Comments (0)