Sciweavers

CSB
2003
IEEE

An Optimal DNA Segmentation Based on the MDL Principle

14 years 5 months ago
An Optimal DNA Segmentation Based on the MDL Principle
: The biological world is highly stochastic and inhomogeneous in its behaviour. There are regions in DNA with high concentration of G or C bases; stretches of sequences with an abundance of CG dinucleotide (CpG islands); coding regions with strong periodicity-of-three pattern, and so forth. Transitions between these regions of DNA, known also as change points, carry important biological information. Computational methods used to identify these homogeneous regions are called segmentations. Viewing a DNA sequence as a non-stationary process, we apply recent novel techniques of universal source coding to discover stationary (possibly recurrent) segments. In particular, the Stein-Ziv lemma is adopted to find an asymptotically optimal discriminant function that determines whether two DNA segments are generated by the same source assuring exponentially small false positives. Next, we use the Minimum Description Length (MDL) principle to select parameters that leads to a linear-time segmentat...
Wojciech Szpankowski, Wenhui Ren, Lukasz Szpankows
Added 04 Jul 2010
Updated 04 Jul 2010
Type Conference
Year 2003
Where CSB
Authors Wojciech Szpankowski, Wenhui Ren, Lukasz Szpankowski
Comments (0)