Existing Web image search engines index images by textual descriptions including filename, image caption, surrounding text, etc. However, the textual description available on the Web could be ambiguous or inaccurate in describing the actual image content and some images irrelevant to user’s query are also returned by text-based search engines. In this paper, we propose to integrate the existing text-based image search engine with visual features, in order to improve the performance of pure text-based Web image search. The proposed algorithm is named SIEVE. Practical fusion methods are proposed to integrate SIEVE with contemporary text-based search engines. In our approach, text-based image search results for a given query are obtained first. Then, SIEVE is used to filter out those images which are semantically irrelevant to the query. Experimental results show that the image retrieval performance using SIEVE improves over Google image search significantly.