Assessors frequently disagree on the topical relevance of documents. How much of this disagreement is due to ambiguity in assessment instructions? We have two assessors assess TREC Legal Track documents for relevance, some to a general topic description, others to detailed assessment guidelines. We find that detailed guidelines lead to no significant increase in agreement amongst assessors or between assessors and the official qrels. Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and software—performance evaluation. Keywords Retrieval experiment, evaluation, e-discovery General Terms Measurement, performance, experimentation