In this paper, we propose a novel query expansion approach for improving transferbased automatic image captioning. The core idea of our method is to translate the given visual query into a distributional semantics based form, which is generated by the average of the sentence vectors extracted from the captions of images visually similar to the input image. Using three image captioning benchmark datasets, we show that our approach provides more accurate results compared to the state-of-theart data-driven methods in terms of both automatic metrics and subjective evaluation.