Genome-scale disk-based suffix tree indexing

15 years 18 days ago

Download www.cs.rpi.edu

With the exponential growth of biological sequence databases, it has become critical to develop effective techniques for storing, querying, and analyzing these massive data. Suffix trees are widely used to solve many sequence-based problems, and they can be built in linear time and space, provided the resulting tree fits in main-memory. To index larger sequences, several external suffix tree algorithms have been proposed in recent years. However, they suffer from several problems such as susceptibility to data skew, non-scalability to genome-scale sequences, and non-existence of suffix links, which are crucial in various suffix tree based algorithms. In this paper, we target DNA sequences and propose a novel disk-based suffix tree algorithm called Trellis which effectively scales up to genome-scale sequences. Specifically, it can index the entire human genome using 2GB of memory, in about 4 hours and can recover all its suffix links within 2 hours. Trellis was compared to various stat...

Benjarath Phoophakdee, Mohammed J. Zaki

Real-time Traffic

Database | Disk-based Suffix Tree | Persistent Disk-based Suffix | SIGMOD 2007 | Suffix Tree Algorithms |

claim paper

Post Info
More Details (n/a)

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2007
Where	SIGMOD
Authors	Benjarath Phoophakdee, Mohammed J. Zaki

Comments (0)

Sciweavers

Genome-scale disk-based suffix tree indexing

Database | Disk-based Suffix Tree | Persistent Disk-based Suffix | SIGMOD 2007 | Suffix Tree Algorithms |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers