In this paper, we present a two-layer generative model that incorporates generic middle-level visual knowledge for dense stereo reconstruction. The visual knowledge is represented by a dictionary of surface primitives including various categories of boundary discontinuities and junctions in parametric form. Given a stereo pair, we first compute a primal sketch representation which decomposes the image into a structural part for object boundaries and high intensity contrast represented by a 2D sketch graph, and a structureless part represented by Markov random field on pixels. Then we label the sketch graph and compute the 3D sketch (like a wire-frame) by fitting the primitive dictionary to the sketch graph. The surfaces between the 3D sketches are filled in by computing the depth of the MRF model on the structureless part. These two levels interact closely since the MRF is used to propagate information between the primitives, and at the same time, the primitives act as boundary condit...