Search Sciweavers | Sciweavers

206

ACSW
2004

192views Security Privacy» more ACSW 2004»

Discovering Parallel Text from the World Wide Web

15 years 8 months ago

Parallel corpus is a rich linguistic resource for various multilingual text management tasks, including crosslingual text retrieval, multilingual computational linguistics and mul...

Jisong Chen, Rowena Chau, Chung-Hsing Yeh

claim paper

Read More »

169

click to vote

IADIS
2003

91views Internet Technology» more IADIS 2003»

SPLAT: A System for Self-Plagiarism Detection

15 years 8 months ago

Download splat.cs.arizona.edu

This paper presents a system for self-plagiarism detection, SPLAT. The system uses a WebL web spider that crawls through the web sites of the top fifty Computer Science department...

Christian S. Collberg, Stephen G. Kobourov, Joshua...

claim paper

Read More »

161

click to vote

CLEF
2005
Springer

115views Information Technology» more CLEF 2005»

EuroGOV: Engineering a Multilingual Web Corpus

16 years 10 days ago

Download www.clef-campaign.org

EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...

Börkur Sigurbjörnsson, Jaap Kamps, Maart...

claim paper

Read More »

184

click to vote

WWW
2010
ACM

220views Internet Technology» more WWW 2010»

Not so creepy crawler: easy crawler generation with standard xml queries

16 years 1 months ago

Download www2.pms.ifi.lmu.de

Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...

Franziska von dem Bussche, Klara A. Weiand, Benedi...

claim paper

Read More »

171

click to vote

WWW
2007
ACM

285views Internet Technology» more WWW 2007»

GigaHash: scalable minimal perfect hashing for billions of urls

16 years 7 months ago

Download www2007.org

A minimal perfect function maps a static set of keys on to the range of integers {0,1,2, ... , - 1}. We present a scalable high performance algorithm based on random graphs for ...

Kumar Chellapilla, Anton Mityagin, Denis Xavier Ch...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers