Abstract. This paper presents a novel approach to analyze the appearance of human motions with a simple model i.e. mapping the motions using a virtual marionette model. The approach is based on a robot using a monocular camera to recognize the person interacting with the robot and start tracking its head and hands. We reconstruct 3-D trajectories from 2-D image space (IS) by calibrating and fusing the camera images with data from an inertial sensor, applying general anthropometric data and restricting the motions to lie on a plane. Through a virtual marionette model we map 3-D trajectories to a feature vector in the marionette control space (MCS). This implies inversely that now a certain set of 3-D motions can be performed by the (virtual) marionette system. A subset of these motions are considered to convey information (i.e. gestures). Thus, we are aiming to build up a database which keeps the vocabulary of gestures represented as signals in the MCS. The main contribution of this wor...