We study in this paper the problem of bridging the semantic gap between low-level image features and high-level semantic concepts, which is the key hindrance in content-based image retrieval. Piloted by the rich textual information of Web images, the proposed framework tries to learn a new distance measure in the visual space, which can be used to retrieve more semantically relevant images for any unseen query image. The framework differentiates with traditional distance metric learning methods in the following ways. 1) A ranking-based distance metric learning method is proposed for image retrieval problem, by optimizing the leave-one-out retrieval performance on the training data. 2) To be scalable, millions of images together with rich textual information have been crawled from the Web to learn the similarity measure, and the learning framework particularly considers the indexing problem to ensure the retrieval efficiency. 3) To alleviate the noises in the unbalanced labels of image...