In this proposal a novel implementation to find and recognize person actions in image sequences of meeting scenarios is introduced. Such extracted information can be used as the basis for content based browsing and the automated analysis of meetings. The presented system consists of four major functional blocks: The detection of a person, the feature extraction to describe actions, and a sophisticated segmentation approach to find action boundaries. The fourth module consists of a statistical classifier. Beside the functionality of these blocks, the image material for training and testing purposes is briefly introduced.