A typical way to perform video annotation requires to classify video elements (e.g. events and objects) according to some pre-defined ontology of the video content domain. Ontologies are defined by establishing relationships between linguistic terms that specify docepts at different abstraction levels. However, although linguistic terms are appropriate to distinguish event and object categories, they are inadequate when they must describe specific or complex patterns of events or video entities. Instead, in these cases, pattern specifications can be better expressed using visual prototypes, either images or video clips, that capture the essence of the event or entity. Therefore enhanced ontologies, that include both visual and linguistic concepts, can be useful to support video annotation up to the level of detail of pattern specification. This paper presents algorithms and techniques that employ enriched ontologies for video annotation and retrieval, and discusses a solution for...