Gene structure prediction is one of the most important problems in computational molecular biology. It involves two steps: the first is finding the evidence (e.g. predicting splice sites) and the second is, interpreting the evidence, that is, trying to determine the whole gene structure by assembling its pieces. In this paper we suggest a combinatorial solution to the second step, which is also referred to as the ”Exon Assembly Problem”. We use a similarity based approach which aims to produce a single gene structure based on similarities to a known homologous sequence. We target the sparse case, where filtering has been applied to the data, resulting in a set of O(n) candidate exon blocks. Our algorithm yields an O(n2 √ n) solution.
Carmel Kent, Gad M. Landau, Michal Ziv-Ukelson