Given a collection of images of a static scene taken by many different people, we identify and segment interesting objects. To solve this problem, we use the distribution of images in the collection along with a new field-of-view cue, which leverages the observation that people tend to take photos that frame an object of interest within the field of view. Hence, image features that appear together in many images are likely to be part of the same object. We evaluate the effectiveness of this cue by comparing the segmentations computed by our method against hand-labeled ones for several different models. We also show how the results of our segmentations can be used to highlight important objects in the scene and label them using noisy user-specified textual tag data. These methods are demonstrated on photos of several popular tourist sites downloaded from the Internet.
Ian Simon, Steven M. Seitz