We address image parsing in the setting of architectural scenes. Our goal is to parse an image into regions of various types such as sky, foliage, buildings, and street. Furthermore we parse the building regions at a finer level of detail, identifying the positions of windows, doors, and rooflines, the colors of walls, and the spatial extent of particular buildings. Recognizing these individual elements is often impossible without the context provided by the initial parsing of the image, for instance a roofline is only defined in relation to the building below and the sky above. Our approach is driven by recognition of generic classes of visual appearance, eg for foliage. The generic recognition results boot-strap an image specific model that provides refined estimates to use for matting, segmentation, and more detailed parsing. We demonstrate results on a wide variety of images.
Alexander C. Berg, Floraine Grabler, Jitendra Mali