The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reco...
We present a lossless compression algorithm, GenCompress, for genetic sequences, based on searching for approximate repeats. Our algorithm achieves the best compression ratios for...
Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantia...
Philipp W. Messer, Ralf Bundschuh, Martin Vingron,...
Given a sequence S of n symbols over some alphabet Σ, we develop a new compression method that is (i) very simple to implement; (ii) provides O(1) time random access to any symbol...
Background: It has been suggested previously that genome and proteome sequences show characteristics typical of natural-language texts such as “signature-style” word usage ind...