We describe a multimedia, multilingual and multimodal research system (CIMWOS) supporting content-based indexing, archiving, retrieval and on-demand delivery of audiovisual content. CIMWOS (Combined IMage and WOrd Spotting) incorporates an extensive set of multimedia technologies by seamless integration of three major components – speech, text and image processing – producing a rich collection of XML metadata annotations following the MPEG-7 standard. These XML annotations are further merged and loaded into the CIMWOS Multimedia Database. Additionally, they can be dynamically transformed for interchanging semantic-based information into RDF documents via XSL stylesheets. The CIMWOS Retrieval Engine is based on a weighted boolean model with intelligent indexing components. A user-friendly webbased interface allows users to efficiently retrieve video segments by a combination of media description, content metadata and natural language text. The database includes sports, broadcast new...