We present a human-centric paradigm for scene understanding. Our approach goes beyond estimating 3D scene geometry and predicts the “workspace” of a human which is represented by a data-driven vocabulary of human interactions. Our method builds upon the recent work in indoor scene understanding and the availability of motion capture data to create a joint space of human poses and scene geometry by modeling the physical interactions between the two. This joint space can then be used to predict potential human poses and joint locations from a single image. In a way, this work revisits the principle of Gibsonian affordances, reinterpreting it for the modern, data-driven era.