Cloth is a complex visual pattern with flexible 3D shape and illumination variations. Computing the 3D shape of cloth from a single image is of great interest to both computer graphics and vision researches. However, the acquisition of 3D cloth shape by Shape from Shading (SFS) is still a challenge. In this paper, we present a two-layer generative model for representing both the 2D cloth image and the 3D cloth surface. The first layer represents all the folds on cloth, which are called "shading primitives" in [4], and thus captures the overall "skeleton structures" of cloth. We learn a number of typical 3D fold primitives using some training images obtained through photometric stereo. The 3D fold primitives yield a dictionary of 2D shading primitives for cloth images. The second layer represents non-fold parts with very smooth (often flat) surface or shading, which interpolates the primitives in the first layer with a smoothness prior like conventional SFS. Then we...