A novel approach is presented for estimating human body posture and motion from a video sequence. Human pose is defined as the instantaneous image plane configuration of a single articulated body in terms of the position of a predetermined set of joints. First, statistical segmentation of the human bodies from the background is performed and low-level visual features are found given the segmented body shape. The goal is to be able to map these visual features to body configurations. Given a set of body motion sequences for training, a set of clusters is built in which each has statistically similar configurations. This unsupervised task is done using the Expectation Maximization algorithm. Then, for each of the clusters, a neural network is trained to build this mapping. Clustering body configurations improves the mapping accuracy. Given new visual features, a mapping from each cluster is performed providing a set of possible poses. From this set, the most likely pose is extracte...