For the legal track we used the Wumpus search engine and investigated several methods that have proven successful in other domains, including cover density ranking and Okapi BM25 ranking. In addition to the traditional bag-of-words model we used boolean terms and character 4-grams. Pseudo-relevance feedback was eected using logistic regression on character 4-grams. Some runs specically excluded documents returned by the boolean query so as to increase the number of such documents in the pool. While our runs were all marked as manual, this was only because the process was not fully automated and several tuning parameters were set after we viewed the data; no data-specic tuning was performed in conguring the system for our runs. Our best performing runs used a combination of all of the above-mentioned techniques.
Stefan Büttcher, Charles L. A. Clarke, Gordon