The content of the world-wide web is pervaded by information of a geographical or spatial nature, particularly such location information as addresses, postal codes, and telephone numbers. We present a system for extracting spatial knowledge from collections of web pages gathered by web-crawling programs. For each page determined to contain location information, we apply geocoding techniques to compute geographic coordinates, such as latitude-longitude pairs. Next, we augment the location information with keyword descriptors extracted from the web page contents. We then apply spatial data mining techniques on the augmented location information to derive spatial knowledge. The techniques make use of so-called shared neighbor information to produce clusters of web pages organized around a common set of concepts. KEYWORDS Web Data Mining, Geographic Information System (GIS), Information Extraction, Geocoding, Geoparsing, Crawl, Clustering, Labeling, Keyword Extraction, Dimension Reduction...
Yasuhiko Morimoto, Masaki Aono, Michael E. Houle,