The bag-of-visual-words is a popular representation for images that has proven to be quite effective for automatic annotation. In this paper, we extend this representation in order to include weak geometrical information by using visual word pairs. We show on a standard benchmark dataset that this new image representation improves significantly the performances of an automatic annotation system.