We describe an interactive system that models regions of an urban environment, such as a group of tall buildings. Traditional image-based modeling methods often cannot model such large areas due to error accumulation and limited camera field of view. Our approach widens the camera field of view by constructing a 360 degree panorama from ground-level images and uses a high resolution orthorectified aerial image to provide the building footprints. Users draw the building outlines in the aerial image and select a point as the approximate ground camera location. The method automatically extracts roof corners in the ground images and registers the panorama to the aerial image according to geometric constraints. The height of each building is calculated from an estimated camera pose. The resulting textured model of the buildings is constructed of planar surfaces.