We consider a setting for discriminative semisupervised learning where unlabeled data are used with a generative model to learn effective feature representations for discriminative training. Within this framework, we revisit the two-view feature generation model of cotraining and prove that the optimum predictor can be expressed as a linear combination of a few features constructed from unlabeled data. From this analysis, we derive methods that employ two views but are very different from co-training. Experiments show that our approach is more robust than co-training and EM, under various data generation conditions.