Performing simple keyword-based search has long been the only way to access information. But for a truly comprehensive search on multimedia data, this approach is no longer sucient. Therefore semantic annotation is a key concern for an improvement of the relevance in image retrieval applications. In this paper1 we propose a system architecture for an automatic large-scale medical image understanding which aims at a formal fusion of feature extraction techniques operating on the bit-level representation of images (and time series data) with formal background knowledge represented in ontologies. We put forward a hierarchical framework of ontologies to formulate a precise and at the same time generic representation of the existing high level knowledge in the medical domain. We present a system architecture which aims at an ion of high- and low-level features on various abstraction levels allowing cross-modal as well as cross-lingual retrieval through Content Based Image Retrieval (CBIR),...