Bag-of-visual Words (BoW) image representation is getting popular in computer vision and multimedia communities. However, experiments show that the traditional BoW representation is not as effective as it is desired. One of the most important reasons for its ineffectiveness is that, the traditional BoW representation lost the spatial information in images. To overcome this problem, we propose the pairwise visual word tree, within which each visual word keeps both the appearance and spatial information between two interest points in image. Thus, the corresponding novel BoW representation preserves the spatial structure in image. Based on the pair-wise visual word tree, we propose an efficient topic word selection algorithm, which utilizes the Latent Semantic Analysis to discover the most expressive visual words for different image categories. An efficient strategy is then utilized to combine the selected topic words for image re-ranking. Massive experiments show that the novel BoW repr...