In this paper, we present a patch-based variational Bayesian framework of image processing using the language of factor graphs (FGs). The variable and factor nodes of FGs represent image patches and their clustering relationship respectively. Unlike previous probabilistic graphical models, we model the structure of FGs by a latent variable, which gives the name "stochastic factor graphs"(SFGs). A sparsity-based prior is enforced to the local distribution functions at factor nodes, which leads to a class of variational expectation-maximization (VEM) algorithms on SFGs. VEM algorithms allow us to infer graph structure along with the target of inference from the observation data. This new framework can systematically exploit nonlocal dependency in natural images as justified by the experimental results in image denoising and inpainting applications.