Compressed q-Gram Indexing for Highly Repetitive Biological Sequences

14 years 7 months ago

Download www.cs.uwaterloo.ca

The study of compressed storage schemes for highly repetitive sequence collections has been recently boosted by the availability of cheaper sequencing technologies and the flood of data they promise to generate. Such a storage scheme may range from the simple goal of retrieving whole individual sequences to the more advanced one of providing fast searches in the collection. In this paper we study alternatives to implement a particularly popular index, namely, the one able of finding all the positions in the collection of substrings of fixed length (q-grams). We introduce two novel techniques and show they constitute practical alternatives to handle this scenario. They excell particularly in two cases: when q is small (up to 6), and when the collection is extremely repetitive (less than 0.01% mutations).

Francisco Claude, Antonio Fariña, Miguel A.

Real-time Traffic

BIBE 2010 | Bioinformatics | Cheaper Sequencing Technologies | Repetitive Sequence Collections | Storage Scheme |

claim paper

Related Content

» Indexing DNA Sequences Using qGrams

» LZ77Like Compression with Fast Random Access

» Storage and Retrieval of Individual Genomes

Post Info
More Details (n/a)

Added	12 May 2011
Updated	12 May 2011
Type	Journal
Year	2010
Where	BIBE
Authors	Francisco Claude, Antonio Fariña, Miguel A. Martínez-Prieto, Gonzalo Navarro

Comments (0)

Sciweavers

Compressed q-Gram Indexing for Highly Repetitive Biological Sequences

BIBE 2010 | Bioinformatics | Cheaper Sequencing Technologies | Repetitive Sequence Collections | Storage Scheme |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers