In recent years, numerous visual Web search interfaces have been developed in the research community. However, the user evaluations of these interfaces have been performed using a wide range of methods, making it difficult to compare and verify the relative value of the proposed advancements. In this paper, we survey these evaluation methods, and propose a stepped evaluation and refinement model for the systematic study and enhancement of visual Web search interfaces. We suggest that this stepped model can be generalized to support the evaluation of other information visualization systems that target exploratory or knowledge-centric domains.