Graph-based methods for semi-supervised learning have recently been shown to be promising for combining labeled and unlabeled data in classification problems. However, inference for graphbased methods often does not scale well to very large data sets, since it requires inversion of a large matrix or solution of a large linear program. Moreover, such approaches are inherently transductive, giving predictions for only those points in the unlabeled set, and not for an arbitrary test point. In this paper a new approach is presented that preserves the strengths of graph-based semisupervised learning while overcoming the limitations of scalability and non-inductive inference, through a combination of generative mixture models and discriminative regularization using the graph Laplacian. Experimental results show that this approach preserves the accuracy of purely graph-based transductive methods when the data has "manifold structure," and at the same time achieves inductive learnin...
Xiaojin Zhu, John D. Lafferty