It is well known that the base composition along eukaryotic genomes is long-range correlated. Here, we investigate the effect of such long-range correlations on alignment score statistics. We model the correlated score-landscape by means of a Gaussian approximation. In this framework, we can calculate the corrections to the scale parameter of the extreme value distribution of alignment scores. To evaluate our approximate analytic results, we perform a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find that the mean and the exponential tail of the score distribution are in fact influenced by the correlations along the sequences. Therefore, the significance of measured alignment scores in biological sequences will change upon incorporation of the correlations in the null model.
Philipp W. Messer, Ralf Bundschuh, Martin Vingron,