This article presents a framework for extracting relevant qualitative chunks from a video sequence. The notion of qualitative descriptors, used to perform the qualitative extraction, will be first described. A grouping algorithm operates on the qualitative descriptions to generate a real-time qualitative segmentation of the image flow. Then, simple pattern recognition methods are used to extract description of basic actions such as ”push”, ”take” or ”pull”. The method proposed here provides an unsupervised learning technique to generate abstract description of actions from a video sequence.