: Some measures such as average precision over all relevant documents and recall level precision are considered as good system-oriented measures, because they concern both precision and recall that are two important aspects for effectiveness evaluation of information retrieval systems. However, such good system-oriented measures suffer from some shortcomings when partial relevance judgment is used. In this paper, we discuss how to rank retrieval systems based on partial relevance judgment, which is common in major retrieval evaluation events such as TREC conferences and NTCIR workshops. Four system-oriented measures, which are average precision over all relevant documents, recall level precision, normalized discount cumulative gain, and normalized average precision over all documents, are discussed. Our investigation shows that with partial relevance judgment, the evaluated results can be far from accurate and incomparable across queries. In such a situation, averaging values over a se...
Shengli Wu, Sally I. McClean