For the purpose of Multimodal Meeting Manager Project (M4), an approach based on face and a hand tracking is proposed. The technique essentially includes skin color detection, segmentation, feature extraction and tracking detected objects. Our aim is to extract information from participant hands and face movement suitable for intelligent video editing and as additional information for speech recognition. An activity of meeting participants is evaluated for this purpose.