This paper describes an interactive vision system for a robot that finds an object specified by a user and brings it to the user. The system first registers object models automatically. When the user specifies an object, the system tries to recognize the object automatically. When the recognition result is shown to the user, the user may provide additional information via speech such as pointing out mistakes, choosing the correct object from multiple candidates, or giving the relative position of the object. Based on the advice, the system tries again to recognize the object. Experiments are described using real-world refrigerator scenes.