This work reports on the advances and on the current status of a terrestrial city modeling approach, which uses images contributed by end-users as input. Hence, the Wiki principle well known from textual knowledge databases is transferred to the goal of incrementally building a virtual representation of the occupied habitat. In order to achieve this objective, many state-of-the-art computer vision methods must be applied and modified according to this task. We describe the utilized 3D vision methods and show initial results obtained from the current image database acquired by in-house participants.