Radiation-induced lung injury, radiation pneumonitis (RP), is a potentially fatal side-effect of thoracic radiation therapy. In this work, using an ensemble of support vector machines (SVMs), we build a binary RP risk model from clinical and dosimetric parameters. Patient/treatment data is partitioned into balanced subsets to prevent model bias. Forward feature selection, maximizing the area under the curve (AUC) for a cross-validated receiver operating characteristic (ROC) curve, is performed on each subset. Model parameter selection and construction occurs concurrently via alternating SVM and gradient descent steps to minimize estimated generalization error. We show that an ensemble classifier with a mean fusion function, 5 component SVMs, and limit of 5 features per classifier exhibits a mean AUC of 0.818 – an improvement over previous SVM models of RP risk. Key words: support vector machine, radiation pneumonitis, feature selection, ensemble learning, unbalanced data
Todd W. Schiller, Yixin Chen, Issam El-Naqa, Josep