Enabling an intelligent access to multimedia data requires a powerful description language. In this paper, we demonstrate why the MPEG-7 standard fails to fulfill this task. We introduce then our proposition: an audio-visual specific description language, modular, reduced, but designed to be extensible. This language is centered on the notions of descriptor and structure with a well-defined semantics. A descriptor can be a low-level feature, automatically extracted from the signal, or a higher semantic concept that will be used to annotate the video documents. The descriptors can be combined into structures according to defined models that provide description patterns. Categories and Subject Descriptors H.1 [Models and Principles]: General General Terms Design, Documentation, Languages, Standardization Keywords Audio-visual description language, descriptor, structure, semantics, MPEG-7, knowledge representation, Semantic Web