Currently in document retrieval there are many algorithms each with different strengths and weakness. There is some difficulty, however, in evaluating the impact of the test query set on retrieval results. The traditional evaluation process, the Cranfield evaluation paradigm, which uses a corpus and a set of user queries, focuses on making the queries as realistic as possible. Unfortunately such query sets lack the fine grained control necessary to test algorithm properties. We present an approach called Controlled Query Generation (CQG) that creates query sets from documents in the corpus in a way that regulates the theoretic information quality of each query. This allows us to generate reproducible and well defined sets of queries of varying length and term specificity. Imposing this level of control over the query sets used for testing retrieval algorithms enables the rigorous simulation of different query environments to identify specific algorithm properties before introdu...
Chris Jordan, Carolyn R. Watters, Qigang Gao