Abstract. The textual description of video sequences exploits conceptual knowledge about the behavior of depicted agents. An explicit representation of such behavioral knowledge facilitates not only the textual description of video evaluation results, but can also be used for the inverse task of generating synthetic image sequences from textual descriptions of dynamic scenes. Moreover, it is shown here that the behavioral knowledge representation within a cognitive vision system can be exploited even for prediction of movements of visible agents, thereby improving the overall performance of a cognitive vision system.