This paper proposes a method of crawling Web servers connected to the Internet without imposing a high processing load. We are using the crawler for a field survey of the digital divide, including the ability to connect to the network. Rather than employing normal Web "page" crawling algorithm, which usually collect all pages found on the target server, we have developed "server" crawling algorithm, which collect only minimum pages from the same server and achieved low-load and high-speed crawling of servers. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Retrieval models, Search process; K.4.1 [Public Policy Issues]: Transborder data flow General Terms Design, Experimentation Keywords Global Digital Divide, Server crawler
Katsuko T. Nakahira, Tetsuya Hoshino, Yoshiki Mika