In this paper, we present an attribute graph grammar for image parsing on scenes with man-made objects, such as buildings, hallways, kitchens, and living rooms. We choose one class of primitives ? 3D planar rectangles projected on images, and six graph grammar production rules. Each production rule not only expands a node into its components, but also includes a number of equations that constrain the attributes of a parent node and those of its children. Thus our graph grammar is context sensitive. The grammar rules are used recursively to produce a large number of objects and patterns in images and thus the whole graph grammar is a type of generative model. The inference algorithm integrates bottom-up rectangle detection which activates topdown prediction using the grammar rules. The final results are validated in a Bayesian framework. The output of the inference is a hierarchical parsing graph with objects, surfaces, rectangles, and their spatial relations. In the inference, the acc...