Recent approaches to reconstructing city-sized areas from large image collections usually process them all at once and only produce disconnected descriptions of image subsets, which typically correspond to major landmarks. In contrast, we propose a framework that lets us take advantage of the available meta-data to build a single, consistent description from these potentially disconnected descriptions. Furthermore, this description can be incrementally updated and enriched as new images become available. We demonstrate the power of our approach by building large-scale reconstructions using images of Lausanne and Prague.