This paper presents a method for recovering 3D facial shape from single image via learning the relationship between the 2D intensity images and the 3D facial shapes. With a coupled training set, the intensity images and their corresponding facial shapes make up two vector spaces respectively. But only the correlated components in both spaces are useful for inference, so there must be embedded hidden subspaces in each space which preserve the interspace correlation information. Thus by learning the projection onto hidden subspaces based on Maximum Correlation Criteria and optimizing the linear transform between the hidden spaces, 3D facial shape is inferred from the intensity image. The effectiveness of the method is demonstrated on both synthesized and real world data.