Sciweavers

PR
2011

A sum-over-paths extension of edit distances accounting for all sequence alignments

13 years 6 months ago
A sum-over-paths extension of edit distances accounting for all sequence alignments
This paper introduces a simple Sum-over-Paths (SoP) formulation of string edit distances accounting for all possible alignments between two sequences, and extends related previous work from bioinformatics to the case of graphs with cycles. Each alignment ℘, with a total cost C(℘), is assigned a probability of occurrence P(℘) = exp[−θC(℘)]/Z where Z is a normalization factor. Therefore, good alignments (having a low cost) are favoured over bad alignments (having a high cost). The expected cost, ℘∈P C(℘) exp [−θC(℘)] /Z, computed over all possible alignments ℘ ∈ P, defines the SoP edit distance. When θ → ∞, only the best alignments matter and the measure reduces to the standard edit distance. The rationale behind this definition is the following: for some applications, two sequences sharing many good alignments should be considered as more similar than two sequences having only one single good, optimal, alignment in common. In other words, sub-optimal al...
Silvia García-Díez, François
Added 14 May 2011
Updated 14 May 2011
Type Journal
Year 2011
Where PR
Authors Silvia García-Díez, François Fouss, Masashi Shimbo, Marco Saerens
Comments (0)