In many vision problems, instead of having fully annotated training data, it is easier to obtain just a subset of data with annotations, because it is less restrictive for the user. For this reason, in this paper, we consider especially the problem of weakly-annotated image retrieval, where just a small subset of the database is annotated with keywords. We present and evaluate a new method which improves the effectiveness of content-based image retrieval, by integrating semantic concepts extracted from text. Our model is inspired from the probabilistic graphical model theory: we propose a hierarchical mixture model which enables to handle missing values and to capture the user's preference by also considering a relevance feedback process. Results of visual-textual retrieval associated to a relevance feedback process, reported on a database of images collected from the Web, partially and manually annotated, show an improvement of about 44.5% in terms of recognition rate against co...