Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assess

15 years 6 months ago

Download www.biomedcentral.com

Background: Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding ...

Paolo Ferragina, Raffaele Giancarlo, Valentina Gre

Real-time Traffic

BMCBI 2007 | Compression Dissimilarity | Key Mathematical Notion | USM Methodology |

claim paper

Post Info
More Details (n/a)

Added	09 Dec 2010
Updated	09 Dec 2010
Type	Journal
Year	2007
Where	BMCBI
Authors	Paolo Ferragina, Raffaele Giancarlo, Valentina Greco, Giovanni Manzini, Gabriel Valiente

Comments (0)

Sciweavers

Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assess

BMCBI 2007 | Compression Dissimilarity | Key Mathematical Notion | USM Methodology |

Explore & Download

Productivity Tools

Sciweavers