Sciweavers

BMCBI
2008

Efficient computation of absent words in genomic sequences

13 years 11 months ago
Efficient computation of absent words in genomic sequences
Background: Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relative to a random sequence of the same length, unique subsequences are overrepresented in real genomes. Shortest words absent from a genome have been addressed in two recent studies. Results: We describe a new algorithm and software for the computation of absent words. It is more efficient than previous algorithms and easier to use. It directly computes unwords without the need to specify a length estimate. Moreover, it avoids the space requirements of index structures such as suffix trees and suffix arrays. Our implementation is available as an open source package. We compute unwords of human and mouse as well as some other organisms, covering a genome size range from 10...
Julia Herold, Stefan Kurtz, Robert Giegerich
Added 08 Dec 2010
Updated 08 Dec 2010
Type Journal
Year 2008
Where BMCBI
Authors Julia Herold, Stefan Kurtz, Robert Giegerich
Comments (0)