We propose a framework for the representation of visual knowledge in a robotic agent, with special attention to the understanding of dynamic scenes. According to our approach, understanding involves the generation of a high level, declarative description of the perceived world. Developing such a description requires both bottom-up, data driven processes that associate symbolic knowledge representation structures with the data coming out of a vision system, and top-down processes in which high level, symbolic information is in its turn employed to drive and further refine the interpretation of a scene. On the one hand, the computer vision community approached this problem in terms of 2D/3D shape reconstruction and of estimation of motion parameters. On the other, the AI community developed rich and expressive systems for the description of processes, events, actions and, in general, of dynamic situations. Nevertheless, these two approaches evolved separately and concentrated on differe...