We propose a data-driven geolocation method on microblog text. Key idea underlying our approach is sparse coding, an unsupervised learning algorithm. Unlike conventional positioning algorithms, we geolocate a user by identifying features extracted from her social media text. We also present an enhancement robust to erasure of words in the text and report our experimental results with uniformly or randomly subsampled microblog text. Our solution features a novel two-step procedure consisting of upconversion and iterative refinement by joint sparse coding. As a result, we can reduce the amount of input data required by geolocation while preserving good prediction accuracy. In the light of information preservation and privacy, we remark potential applications of these results. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—Clustering; I.2.6 [Artificial Intelligence]: Learning—Unsupervised feature learning Keywords Geol...
Miriam Cha, Youngjune L. Gwon, H. T. Kung