Abstract. Evaluation criteria for conversational CBR (CCBR) systems are important to guide development and tuning of new methods, and to enable practitioners to make informed decisions about which methods to use. Traditional criteria for evaluating CCBR performance by precision and efficiency provide useful information, but are limited by their focus on the single point at which a case is selected at the end of the system dialogue, and by their dependence on a model of the user's case selection criteria. This paper begins by revisiting issues in the evaluation of CCBR systems, arguing for the value of assessing the quality of the intermediate dialogue before case selection. It then proposes an evaluation approach based on rank quality to provide a fuller picture of system performance, and illustrates with an empirical study the use of rank quality to illuminate characteristics of similarity assessment strategies for partially-specified cases.
Steven Bogaerts, David B. Leake