Using the Genetic Code Wisdom for Recognizing Protein Coding Sequences

15 years 3 months ago

Download www.smorfland.uni.wroc.pl

We have elaborated a new method of recognizing protein coding sequences in genomic sequences. The method is exploiting a specific way of genetic code degeneration and relations between mutational pressure and selection pressure shaping the amino acid usage in the proteomes. It is based on analyses of correlations in nucleotide occurrence separately in the first, the second and the third putative codon positions using only six matrices 4x4. Small sizes of matrices enable using only a few coding sequences for training the algorithm. The results of the new method were compared with Markov chain methods used in GeneMark for different genomes including DNA strand (leading/lagging) discrimination. There are no arbitrary "cut off" discriminating between coding and noncoding sequences, on the other hand there is a possibility to rank putative coding sequences according to their coding probability what is especially important in looking for small coding ORFs.

Pawel Blazej, Pawel Mackiewicz, Stanislaw Cebrat

Real-time Traffic