In this paper we propose a paradigm called the Interactive Visual Dialog (IVD) as a means of facilitating a system's ability to recognize objects presented to it by a human. The presentation centers around a supermarket checkout scenario in which an operator presents an item to be tallied to a stationary television camera. An active vision approach is used to provide feedback to the operator in the form of an image (or images) depicting what the system thinks the operator is most likely holding, shown in a viewpoint that suggests how the object should next be presented to improve the certainty of interpretation. Interaction proceeds iteratively until the system converges on the correct interpretation. We show how the IVD can be implemented using an entropy-based gaze planning strategy and a sequential Bayes recognition system using optical flow as input. Experimental results show that the system does, in practice, improve recognition accuracy, leading to convergence to a correct ...
Tal Arbel, Frank P. Ferrie