Search Sciweavers | Sciweavers

160

WWW
2006
ACM

139views Internet Technology» more WWW 2006»

Do not crawl in the DUST: different URLs with similar text

16 years 18 days ago

Download www2007.org

We consider the problem of dust: Diﬀerent URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...

Uri Schonfeld, Ziv Bar-Yossef, Idit Keidar

claim paper

Read More »

172

Voted

WWW
2001
ACM

113views Internet Technology» more WWW 2001»

Crawling the Hidden Web

16 years 7 months ago

Download www.dia.uniroma3.it

Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pag...

Sriram Raghavan, Hector Garcia-Molina

claim paper

Read More »

173

click to vote

SIGIR
2008
ACM

104views Information Technology» more SIGIR 2008»

Compressed collections for simulated crawling

15 years 6 months ago

Download www.sigir.org

Collections are a fundamental tool for reproducible evaluation of information retrieval techniques. We describe a new method for distributing the document lengths and term counts ...

Alessio Orlandi, Sebastiano Vigna

claim paper

Read More »

161

Voted

WWW
2007
ACM

126views Internet Technology» more WWW 2007»

Crawling multiple UDDI business registries

16 years 7 months ago

Download www2007.org

As Web services proliferate, size and magnitude of UDDI Business Registries (UBRs) are likely to increase. The ability to discover Web services of interest then across multiple UB...

Eyhab Al-Masri, Qusay H. Mahmoud

claim paper

Read More »

138

click to vote

WWW
2005
ACM

138views Internet Technology» more WWW 2005»

Crawling a country: better strategies than breadth-first for web page ordering

16 years 7 months ago

Download www.tejedoresdelweb.com

Ricardo A. Baeza-Yates, Carlos Castillo, Mauricio ...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers