This paper reports our multimedia information retrieval experiments carried out for the ImageCLEF track (ImageCLEFwiki). The task is to answer to user information needs, i.e. queries which may be composed of several modalities (text, image, concept) with ranked lists of relevant documents. The purpose of our experiments is twofold: firstly, our overall aim is to develop a multimedia document model combining text and/or image modalities. Secondly, we aim to compare results of our model using a multimedia query with a text only model. Our multimedia document model is based on a vector of textual and visual terms. The textual terms correspond to words. The visual ones result from local colour descriptors which are automatically extracted and quantized by k-means, leading to an image vocabulary. They represent the colour property of an image region. To perform a query, we compute a similarity score between each document vector (textual + visual terms) and the query using the Okapi method ...