As part of the Language Observatory Project [4], we have been crawling all the web space since 2004. We have collected terabytes of data mostly from Asian and African ccTLDs. In t...
Rizza Camus Caminero, Pavol Zavarsky, Yoshiki Mika...
Watson is a gateway to the Semantic Web: it collects, analyzes and gives access to ontologies and semantic data available online. Its objective is to support the development of ne...
We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is...
We present in this paper a method to discover the set of webpages contained in a logical website, based on the link structure of the Web graph. Such a method is useful in the conte...
: Dynamic Web data sources – sometimes known collectively as the Deep Web – increase the utility of the Web by providing intuitive access to data repositories anywhere that Web...
Daniel Rocco, James Caverlee, Ling Liu, Terence Cr...