We describe a model for strings of characters that is loosely based on the Lempel Ziv model with the addition that a repeated substring can be an approximate match to the original substring; this is close to the situation of DNA, for example. Typically there are many explanations for a given string under the model, some optimal and many suboptimal. Rather than commit to one optimal explanation, we sum the probabilities over all explanations under the model because this gives the probability of the data under the model. The model has a small number of parameters and these can be estimated from the given string by an expectation-maximization (EM) algorithm. Each iteration of the EM algorithm takes
Lloyd Allison, Timothy Edgoose, Trevor I. Dix