Mobile vision services have recently been proposed for the support of urban nomadic users. A major issue for the performance of the service - involving indexing into a huge amount of reference images - is ambiguity in the visual information. We propose to exploit geo-information in association with visual features to restrict the search within a local context. In a mobile image retrieval task of urban object recognition, we determine object hypotheses from (i) mobile image based appearance and (ii) GPS based positioning and investigate the performance of Bayesian information fusion with respect to a geo-referenced image database (TSG-20). The results from geo-referenced image capture in an urban scenario prove a significant increase in recognition accuracy (> 10%) when using the geo-contextual information in contrast to omitting geo-information.