In this paper we apply three pattern recognition methods (support vector machine, cluster analysis and principal component analysis) to distinguish regulatory regions from coding and non-coding non regulatory DNA sequences. Using a new feature representation (the degree by which motifs are over- and under-represented) we demonstrate the remarkable power of this methodology in identifying regulatory regions of Drosophila melanogaster.
Rene te Boekhorst, Irina I. Abnizova, Lorenz Werni