We present an algorithm for automatic inference of human upper body motion. A graph model is proposed for inferring human motion, and motion inference is posed as a mapping problem between state nodes in the graph model and features in image patches. Belief propagation is utilized for Bayesian inference in this graph. A multiple-frame inference model/algorithm is proposed to combine both structural and temporal constraints in human motion. We also present a method for capturing constraints of human body configuration under different view angles. The algorithm is applied in a prototype system that can automatically label upper body motion from videos, without manual initialization of body parts.