We propose a method for estimating the pose of a human body using its approximate 3D volume (visual hull) obtained in real time from synchronized videos. Our method can cope with loose-fitting clothing, which hides the human body and produces non-rigid motions and critical reconstruction errors, as well as tight-fitting clothing. To follow the shape variations robustly against erratic motions and the ambiguity between a reconstructed body shape and its pose, the probabilistic dynamical model of human volumes is learned from training temporal volumes refined by error correction. The dynamical model of a body pose (joint angles) is also learned with its corresponding volume. By comparing the volume model with an input visual hull and regressing its pose from the pose model, pose estimation can be realized. In our method, this is improved by double volume comparison: 1) comparison in a low-dimensional latent space with probabilistic volume models and 2) comparison in an observation volum...