This paper describes a multimedia, multilingual and multimodal research system called CIMWOS (Combined IMage and WOrd Spotting). CIMWOS incorporates an extensive set of multimedia technologies, integrating three major subsystems (text, speech, and image processing). It produces a rich collection of XML metadata annotations following the MPEG-7 standard. These XML annotations are further merged and loaded into the CIMWOS Multimedia Database. An ergonomic and user-friendly Web-based interface allows the user to efficiently retrieve video segments by a combination of media description, content metadata and natural language text. Currently the system is under evaluation and contains broadcast news and documentaries in three languages (English, Greek, and French), while the open architecture allows for more languages to be incorporated in the future. Key Words: Multimedia Databases, Electronic Libraries, User Interfaces, Applications