We present a multi-camera system for audio-visual analysis of dance figures. The multi-view video of a dancing actor is acquired using 8 synchronized cameras. The motion capture technique of the proposed system is based on 3D tracking of the markers attached to the person's body in the scene. The resulting set of 3D points is then used to extract the body motion features as 3D displacement vectors whereas MFC coefficients serve as the audio features. In the multi-modal analysis phase, we perform Hidden Markov Model (HMM) based unsupervised temporal segmentation of the audio and body motion features such as legs and arms, separately, to determine the recurrent elementary audio and body motion patterns in the first stage. Then in the second stage, we investigate the correlation of body motion patterns with audio patterns that can be used towards estimation and synthesis of realistic audio-driven body animation.