It’s common experience for human vision to perceive full 3D shape and scene from a single 2D image with the occluded parts “filled-in” by prior visual knowledge. In this paper we represent prior knowledge of 3D shapes and scenes by probabilistic models at two levels – both are defined on graphs. The first level model is built on a graph representation for single objects, and it is a mixture model for both man-made block objects and natural objects such as trees and grasses. It assumes surface and boundary smoothness, 3D angle symmetry etc. The second level model is built on the relation graph of all objects in a scene. It assumes that objects should be supported for maximum stability with global bounding surfaces, such as ground, sky and walls. Given an input image, we extract the geometry and photometric structures through image segmentation and sketching, and represent them in a big graph. Then we partition the graph into subgraphs each being an object, infer the 3D shape...