In this paper a new approach for activity and dominance modeling in meetings is presented. For this purpose low level acoustic and visual features are extracted from audio and video capture devices. Hidden Markov Models (HMM) are used for the segmentation and classification of activity levels for each participant. Additionally, more semantic features are applied in a two-layer HMM approach. The experiments show that the acoustic feature is the most important one. The early fusion of acoustic and globalmotion features achieves nearly as good results as the acoustic feature alone. All the other early fusion approaches are outperformed by the acoustic feature. More over, the two-layer model could not achieve the results of the acoustic features.