Background: Sequence-derived structural and physicochemical descriptors have frequently been used in machine learning prediction of protein functional families, thus there is a need to comparatively evaluate the effectiveness of these descriptor-sets by using the same method and parameter optimization algorithm, and to examine whether the combined use of these descriptorsets help to improve predictive performance. Six individual descriptor-sets and four combinationsets were evaluated in support vector machines (SVM) prediction of six protein functional families. Results: The performance of these descriptor-sets were ranked by Matthews correlation coefficient (MCC), and categorized into two groups based on their performance. While there is no overwhelmingly favourable choice of descriptor-sets, certain trends were found. The combinationsets tend to give slightly but consistently higher MCC values and thus overall best performance such that three out of four combination-sets show slight...
Serene A. K. Ong, Hong Huang Lin, Yu Zong Chen, Ze