Predicting and proper ranking of splice sites (SS) is a challenging problem in bioinformatics and machine learning communities. Proposed method of donor and acceptor SSs prediction is based on counting oligonucleotide frequencies for splice and splice-like signals. Based on bayesian principle SS sensors were built. We demonstrate advantage of our proposed sensor design compared with existing sensors and tools. In particular, our donor sensor outperforms Maximum Entropy Sensor for several representative test sets of genes when compared on Receiver Operating Characteristic (ROC) curve. We represent combinatorial interaction of SSs and related factors with Logarithm Of oDds (LOD) weight matrices. Based on factor interactions we were able to substantially improve splice signals prediction quality and rank SSs better than SpliceView, GeneSplicer, NNSplice and Genio tools. Proposed method is used in our new splicing simulator SpliceScan.
Alexander G. Churbanov, Hesham H. Ali