Test collections are the primary drivers of progress in information retrieval. They provide a yardstick for assessing the effectiveness of ranking functions in an automatic, rapid, and repeatable fashion and serve as training data for learning to rank approaches. However, manual construction of test collections tends to be slow, labor-intensive, and expensive. This paper examines the feasibility of constructing Web search test collections in a completely unsupervised manner given only a large Web corpus as input. Within the proposed framework, anchor text extracted from the Web graph is treated as a pseudo-query log from which pseudo queries are sampled. For each pseudo query, a set of relevant and non-relevant documents are selected using a variety of Webspecific features, including spam and aggregated anchor text weights. The automatically mined queries and judgments form a pseudo-test collection that can be used for evaluation or training learning to rank models. Experiments carr...