In this paper, we present a novel method, the first to date to our knowledge, which is capable of directly and automatically producing a concise and idealized 3D representation from unstructured point data of complex cluttered real-world scenes, with a high level of noise and a significant proportion of outliers, such as those obtained from passive stereo. Our algorithm can digest millions of input points into an optimized lightweight watertight polygonal mesh free of self-intersection, that preserves the structural components of the scene at a user-defined scale, and completes missing scene parts in a plausible manner. To achieve this, our algorithm incorporates priors on urban and architectural scenes, notably the prevalence of vertical structures and orthogonal intersections. A major contribution of our work is an adaptive decomposition of 3D space induced by planar primitives, namely a polyhedral cell complex. We experimentally validate our approach on several challenging noisy ...