Content-based image retrieval (CBIR) has been applied to a variety of medical applications, e.g., pathology research and clinical decision support, and bag-of-features (BOF) model is one of the most widely used techniques. In this study, we address the problem of vocabulary pruning to reduce the influence from the redundant and noisy visual words. The conditional probability of each word upon the hidden topics extracted using probabilistic Latent Semantic Analysis (pLSA) is firstly calculated. A ranking method is then proposed to compute the significance of the words based on the relationship between the words and topics. Experiments on the publicly available Early Lung Cancer Action Program (ELCAP) database show that the method can reduce the number of words required while improving the retrieval performance. The proposed method is applicable to general image retrieval since it is independent of the problem domain.