Most geographic information retrieval systems depend on the detection and disambiguation of place names in documents, assuming that the documents with a specific geographic scope contain explicit place names in the text that are strongly related to the document scopes. However, some non-geographic names such as companies, monuments or sport events, may also provide indirect relevant evidence that can significantly contribute to the assignment of geographic scopes to documents. In this paper, we analyze the amount of implicit and explicit geographic evidence in newspaper documents, and measure its impact on geographic information retrieval by evaluating the performance of a retrieval system using the GeoCLEF evaluation data. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.3 Information Search and Retrieval; H.3.4 Systems and Software General Terms Algorithms, Design Keywords Geographic Information Retrieval, Named Entity Recognition, Wikipedia Mining
Nuno Cardoso, Mário J. Silva, Diana Santos