Search Sciweavers | Sciweavers

169

IADIS
2003

91views Internet Technology» more IADIS 2003»

SPLAT: A System for Self-Plagiarism Detection

15 years 8 months ago

This paper presents a system for self-plagiarism detection, SPLAT. The system uses a WebL web spider that crawls through the web sites of the top fifty Computer Science department...

Christian S. Collberg, Stephen G. Kobourov, Joshua...

claim paper

Read More »

199

click to vote

DEXAW
2010
IEEE

181views Database» more DEXAW 2010»

Towards a Search System for the Web Exploiting Spatial Data of a Web Document

15 years 8 months ago

Download laclavik.net

In this paper, we describe our work in progress in the scope of information retrieval exploiting the spatial data extracted from web documents. We discuss problems of a search for ...

Stefan Dlugolinsky, Michal Laclavik, Ladislav Hluc...

claim paper

Read More »

207

click to vote

WSDM
2009
ACM

176views Data Mining» more WSDM 2009»

The web changes everything: understanding the dynamics of web content

16 years 1 months ago

Download turing.cs.washington.edu

The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different...

Eytan Adar, Jaime Teevan, Susan T. Dumais, Jonatha...

claim paper

Read More »

179

click to vote

WWW
2005
ACM

122views Internet Technology» more WWW 2005»

Exploiting the deep web with DynaBot: matching, probing, and ranking

16 years 7 months ago

Download www.westga.edu

We present the design of Dynabot, a guided Deep Web discovery system. Dynabot's modular architecture supports focused crawling of the Deep Web with an emphasis on matching, p...

Daniel Rocco, James Caverlee, Ling Liu, Terence Cr...

claim paper

Read More »

162

click to vote

CLEF
2005
Springer

115views Information Technology» more CLEF 2005»

EuroGOV: Engineering a Multilingual Web Corpus

16 years 12 days ago

Download www.clef-campaign.org

EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...

Börkur Sigurbjörnsson, Jaap Kamps, Maart...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers