In this paper we test the hypothesis Given a piece of text describing an object or concept our combined disambiguation method can disambiguate whether it is a place and ground it to a Getty Thesaurus of Geographical Names unique identifier with significantly more accuracy than na¨ıve methods. We demonstrate a carefully engineered rule-based place name disambiguation system and give Wikipedia as a worked example with hand-generated ground truth and bench mark tests. This paper outlines our plans to apply the co-occurrence models generated with Wikipedia to solve the problem of disambiguating place names in text using supervised learning techniques. Categories and Subject Descriptors H.3.1 [Information storage and retrieval]: Content Analysis and Indexing Keywords Geographic Information Retrieval, Disambiguation, Wikipedia
Simon E. Overell, Stefan M. Rüger