We describe the establishment of a compound object model for object recognition purposes which provides the frame for the extraction of object structure from images degraded by noise. Our vision system is inspired by cognitive principles. From a set of sample views we automatically generate a sparse and view-based object representation, which contains enough information to represent the object for all poses. To verify this property we apply it in a pose estimation task with noisy and unfamiliar test views of the object. With an appropriate number of views in the object representation the proposed method shows a good selectivity and is able to distinguish views with a distance of only 3.6◦ , even if they are degraded considerably by Gaussian noise. KEY WORDS Computer Vision, Noise, Pose Estimation, Tracking, 3D Object Recognition