This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with ...
In this paper, we cast the image-ranking problem into the task of identifying "authority" nodes on an inferred visual similarity graph and propose an algorithm to analyz...
We present the results of a community detection analysis of the Wikipedia graph. Distinct communities in Wikipedia contain semantically closely related articles. The central topic...
Abstract. The requirements for effective search and management of the WWW are stronger than ever. Currently Web documents are classified based on their content not taking into acco...
Maria Halkidi, Benjamin Nguyen, Iraklis Varlamis, ...
One of the main limitations when accessing the web is the lack of explicit structure, whose presence may help in understanding data semantics. Schema for web data can be constructe...