We describe an approach for multi-modal characterization of social media by combining text features (e.g. tags as a prominent example of short, unstructured text labels) with spatial knowledge (e.g. geotags and coordinates of images and videos). Our model-based framework GeoFolk combines these two aspects in order to construct better algorithms for content management, retrieval, and sharing. The approach is based on multi-modal Bayesian models which allow us to integrate spatial semantics of social media in a well-formed, probabilistic manner. We systematically evaluate the solution on a subset of Flickr data, in characteristic scenarios of tag recommendation, content classification, and clustering. Experimental results show that our method outperforms baseline techniques that are based on one of the aspects alone. The approach described in this contribution can also be used in other domains such as Geoweb retrieval. Categories and Subject Descriptors H.3.3 [Information Search and Ret...