This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
We present an algorithm called the Best Trail Algorithm, which helps solve the hypertext navigation problem by automating the construction of memex-like trails through the corpus....
Streaming media applications are becoming more popular on the late years, as for example, news transmitted live through the web, music, show, and films. Traditional client/server...
Marisa A. Vasconcelos, Leonardo C. da Rocha, Julia...
Search engines provide search results based on a large repository of pages downloaded by a web crawler from several servers. To provide best results, this repository must be kept ...
We perform a clustering of the Chilean Web Graph using a local fitness measure, optimized by simulated annealing, and compare the obtained cluster distribution to that of two mod...
Web crawler design presents many different challenges: architecture, strategies, performance and more. One of the most important research topics concerns improving the selection o...
RDF is the first W3C standard for enriching information resources of the Web with detailed meta data. The semantics of RDF data is defined using a RDF schema. The most expressiv...
CSCL software tools must provide support for group work and should be based on a collaborative learning technique. The PBL based CCCuento tool is introduced here. It is intended t...