Understanding the sequence-to-structure relationship is a central task in bioinformatics research. Adequate knowledge about this relationship can potentially improve accuracy for local protein structure prediction. One of approaches for protein local structure prediction uses the conventional clustering algorithms to capture the sequence-to-structure relationship. The cluster membership function defined by conventional clustering algorithms may not reveal the complex nonlinear relationship adequately. Compared with the conventional clustering algorithms, Support Vector Machine (SVM) can capture the nonlinear sequence-to-structure relationship by mapping the input space into another higher dimensional feature space. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called Clustering Support Vector Machines (CSVMs). Taking advantage of both theory of granular computing and advanced statistical learning metho...
Wei Zhong, Jieyue He, Robert W. Harrison, Phang C.