Sciweavers

ISMB
1993

Discovering Sequence Similarity by the Algorithmic Significance Method

14 years 1 months ago
Discovering Sequence Similarity by the Algorithmic Significance Method
The minimal-length encoding approach is applied to define concept of sequence similarity. Asequence is defined to be similar to another sequence or to a set of keywords if it can be encoded in a small number of bits by taking advantage of commonsubwords. Minimal-length encoding of a sequence is computed in linear time, using a data compression algorithm that is based on a dynamic programming strategy and the directed acyclic wordgraph data structure. No assumptions about commonword ("k-tuple") length are made in advance, and commonwords of any length are considered. The newly proposed algorithmic significance method provides an exact upper bound on the probability that sequence similarity has occurred by chance, thus eliminating the need for any arbitrary choice of similarity thresholds. Preliminary experiments indicate that a small number of keywords can positively identify a DNA sequence, which is extremely relevant in the context of partial sequencing by hybridization.
Aleksandar Milosavljevic
Added 02 Nov 2010
Updated 02 Nov 2010
Type Conference
Year 1993
Where ISMB
Authors Aleksandar Milosavljevic
Comments (0)