— We describe a method for an autonomous robot to efficiently locate one or more distinct objects in a realistic environment using monocular vision. We demonstrate how to efficiently subdivide acquired images into interest regions for the robot to zoom in on, using receptive field cooccurrence histograms. Objects are recognized through SIFT feature matching and the positions of the objects are estimated. Assuming a 2D map of the robot’s surroundings and a set of navigation nodes between which it is free to move, we show how to compute an efficient sensing plan that allows the robot’s camera to cover the environment, while obeying restrictions on the different objects’ maximum and minimum viewing distances. The approach has been implemented on a real robotic system and results are presented showing its practicability and the quality of the position estimates obtained.