In this paper we propose a framework for learning a regression function form a set of local features in an image. The regression is learned from an embedded representation that reflects the local features and their spatial arrangement as well as enforces supervised manifold constraints on the data. We applied the approach for viewpoint estimation on a Multiview car dataset, a head pose dataset and arm posture dataset. The experimental results show that this approach has superior results (up to 67% improvement) to the state-of-the-art approaches in very challenging datasets .