In this paper, we consider the problem of recovering the
spatial layout of indoor scenes from monocular images. The
presence of clutter is a major problem for existing singleview
3D reconstruction algorithms, most of which rely on
finding the ground-wall boundary. In most rooms, this
boundary is partially or entirely occluded. We gain robustness
to clutter by modeling the global room space with a
parameteric 3D “box” and by iteratively localizing clutter
and refitting the box. To fit the box, we introduce a structured
learning algorithm that chooses the set of parameters
to minimize error, based on global perspective cues. On
a dataset of 308 images, we demonstrate the ability of our
algorithm to recover spatial layout in cluttered rooms and
show several examples of estimated free space.