Sciweavers

295 search results - page 27 / 59
» Web Crawling
Sort
View
IADIS
2003
13 years 11 months ago
SPLAT: A System for Self-Plagiarism Detection
This paper presents a system for self-plagiarism detection, SPLAT. The system uses a WebL web spider that crawls through the web sites of the top fifty Computer Science department...
Christian S. Collberg, Stephen G. Kobourov, Joshua...
DEXAW
2010
IEEE
181views Database» more  DEXAW 2010»
13 years 11 months ago
Towards a Search System for the Web Exploiting Spatial Data of a Web Document
In this paper, we describe our work in progress in the scope of information retrieval exploiting the spatial data extracted from web documents. We discuss problems of a search for ...
Stefan Dlugolinsky, Michal Laclavik, Ladislav Hluc...
WSDM
2009
ACM
176views Data Mining» more  WSDM 2009»
14 years 4 months ago
The web changes everything: understanding the dynamics of web content
The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different...
Eytan Adar, Jaime Teevan, Susan T. Dumais, Jonatha...
WWW
2005
ACM
14 years 10 months ago
Exploiting the deep web with DynaBot: matching, probing, and ranking
We present the design of Dynabot, a guided Deep Web discovery system. Dynabot's modular architecture supports focused crawling of the Deep Web with an emphasis on matching, p...
Daniel Rocco, James Caverlee, Ling Liu, Terence Cr...
CLEF
2005
Springer
14 years 3 months ago
EuroGOV: Engineering a Multilingual Web Corpus
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...
Börkur Sigurbjörnsson, Jaap Kamps, Maart...