Sciweavers

RECOMB
2003
Springer

Finding recurrent sources in sequences

15 years 22 days ago
Finding recurrent sources in sequences
Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or assembled from a small number of sources, each of which might contribute several segments to the sequence. That is, there are h hidden sources such that the sequence can be written as a concatenation of k > h pieces, each of which stems from one of the h sources. We define this (k, h)segmentation problem and show that it is NP-hard in the general case. We give approximation algorithms achieving approximation ratios of 3 for the L1 error measure and 5 for the L2 error measure, and generalize the results to higher dimensions. We give empirical results on real (chromosome 22) and artificial data showing that the methods work well in practice. Categories and Subject Descriptors F.2.2 [Analysis of algorithms and problem complexity]: Nonnumerical algorithms and problems--Computations on discrete str...
Aristides Gionis, Heikki Mannila
Added 03 Dec 2009
Updated 03 Dec 2009
Type Conference
Year 2003
Where RECOMB
Authors Aristides Gionis, Heikki Mannila
Comments (0)