As part of the Language Observatory Project [4], we have been crawling all the web space since 2004. We have collected terabytes of data mostly from Asian and African ccTLDs. In t...
Rizza Camus Caminero, Pavol Zavarsky, Yoshiki Mika...
Automatic classification of web pages is an effective way to deal with the difficulty of retrieving information from the Internet. Although there are many automatic classification...
This paper presents a method for automatically converting raw GPS traces from everyday vehicles into a routable road network. The method begins by smoothing raw GPS traces using a...
The longstanding problem of automatic table interpretation still illudes us. Its solution would not only be an aid to table processing applications such as large volume table conve...
We illustrate that Web searches can often be utilized to generate background text for use with text classification. This is the case because there are frequently many pages on the...