Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system. Although many machine learning methods have been successfully applied to the task, most of them fail to consider two practical yet important issues in software defect detection. First, it is rather difficult to collect a large amount of labeled training data to learn a well-performing model; second, a software system usually contains much fewer defective modules than defect-free modules, in which learning should be conducted over an imbalanced data set. In this paper, we address these two practical issues simultaneously by proposing a novel semisupervised learning approach named Rocus. This method exploits the abundant unlabeled examples to improve the detection accuracy, as well as employs under-sampling to tackle the class-imbalance problem in the learning process. Experimental results on real-world software defect dete...