— In this paper, we present an approach allowing a robot to learn a generative model of its own physical body from scratch using self-perception with a single monocular camera. Our approach yields a compact Bayesian network for the robot’s kinematic structure including the forward and inverse models relating action commands and body pose. We propose to simultaneously learn local action models for all pairs of perceivable body parts from data generated through random “motor babbling.” From this repertoire of local models, we construct a Bayesian network for the full system using the pose prediction accuracy on a separate cross validation data set as the criterion for model selection. The resulting model can be used to predict the body pose when no perception is available and allows for gradient-based posture control. In experiments with real and simulated manipulator arms, we show that our system is able to quickly learn compact and accurate models and to robustly deal with nois...