We extend the successful 2D robust feature concept into the third dimension in that we produce a descriptor for a reconstructed 3D surface region. The descriptor is perspectively invariant if the region can locally be approximated well by a plane. We exploit depth and texture information, which is nowadays available in real-time from video of moving cameras, from stereo systems or PMD cameras (photonic mixer devices [19]). By computing a normal view onto the surface we still keep the descriptiveness of similarity invariant features like SIFT[11] while achieving invariance against perspective distortions, while descriptiveness typically suffers when using affine invariant features. Our approach can be exploited for structure-from-motion, for stereo or PMD cameras, alignment of large scale reconstructions or improved video registration.