We present an algorithm for identifying putative non-coding RNA (ncRNA) using an RCSG (RNA Common-Structural Grammar) and show the effectiveness of the algorithm. The algorithm consists of two steps: structure learning step and sequence learning step. Both steps are based on genetic programming. Generally, genetic programming has been applied to learning programs automatically, reconstructing networks, and predicting protein secondary structures. In this study, we use genetic programming to optimize structural grammars. The structural grammars can be formulated as rules of tree structure including function variables. They can be learned by genetic programming. We have defined the rules on how structure definition grammars can be encoded into function trees. The performance of the algorithm is demonstrated by the results obtained from the experiments with RCSG of tRNA and 5S small RNA.
Jin-Wu Nam, Je-Gun Joung, Y. S. Ahn, Byoung-Tak Zh