Given a collection of images of offices, what would we say we see in the images? The objects of interest are likely to be monitors, keyboards, phones, etc. Such identification of the foreground in a scene is important to avoid distractions caused by background clutter and facilitates better understanding of the scene. It is crucial for such an identification to be unsupervised to avoid extensive human labeling as well as biases induced by human intervention. Most interesting scenes contain multiple objects of interest. Hence, it would be useful to separate the foreground into the multiple objects it contains. We propose dISCOVER, an unsupervised approach to identifying the multiple objects of interest in a scene from a collection of images. In order to achieve this, it exploits the consistency in foreground objects - in terms of occurrence and geometry - across the multiple images of the scene.