The paper presents a novel multi-view learning framework based on variational inference. We formulate the framework as a graph representation in form of graph factorization: the graph comprises of factor graphs, which are used to describe internal states of views. Each view is modeled with a Gaussian mixture model. The proposed framework has three main advantages 1) less constraint assumed on data, 2) effective utilization of unlabeled data, and 3) automatic data structure inferring: proper data structure can be inferred in only one round. The experiments on image segmentation demonstrate its effectiveness.