The performance of a Content-Based Image Retrieval (CBIR) system presented in the form of Precision-Recall or PrecisionScope graphs offers an incomplete overview of the system under study: the influence of the irrelevant items is obscured. In this paper, we propose a comprehensive and well normalized description of the ranking performance compared to the performance of an Ideal Retrieval System defined by ground-truth for a large number of predefined queries. We advocate normalization with respect to relevant class size and restriction to specific normalized scope values. We also propose new performance graphs for total recall studies in a range of embeddings.
Nicu Sebe, Dionysius P. Huijsmans, Qi Tian, Theo G