Background: The ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants. Results: We report a novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting the relative thermostability of protein mutants. The scoring function was developed based on an elaborate analysis of a set of features calculated or predicted from 540 pairs of hyperthermophilic and mesophilic protein ortholog sequences. It was constructed by a linear combination of ten important features identified by a feature ranking procedure based on the random forest classification algorithm. The weights of these features in the scoring function were fitted by a hill-climbing algorithm. This scoring function has shown an excellent ability to discriminate hyperthermophili...
Yunqi Li, C. Russell Middaugh, Jianwen Fang