Sciweavers

BMCBI
2008

A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic ana

13 years 11 months ago
A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic ana
Background: Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. Results: In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performan...
Bin Liu, Xiaolong Wang, Lei Lin, Qiwen Dong, Xuan
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2008
Where BMCBI
Authors Bin Liu, Xiaolong Wang, Lei Lin, Qiwen Dong, Xuan Wang
Comments (0)