3D shape determines an object's physical properties to a large degree. In this article, we introduce an autonomous learning system for categorizing 3D shape of simulated objects from single views. The system extends an unsupervised bottom-up learning architecture based on the slowness principle with top-down information derived from the physical behavior of objects. The unsupervised bottom-up learning leads to pose invariant representations. Shape specificity is then integrated as top-down information from the movement trajectories of the objects. As a result, the system can categorize 3D object shape from a single static object view without supervised postprocessing.