Sciweavers

48 search results - page 5 / 10
» Language Based Crawling: Crawling the Arabic Content of the ...
Sort
View
WWW
2008
ACM
14 years 7 months ago
Recrawl scheduling based on information longevity
It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
Christopher Olston, Sandeep Pandey
WWW
2009
ACM
14 years 7 months ago
Detecting soft errors by redirection classification
A soft error redirection is a URL redirection to a page that returns the HTTP status code 200 (OK) but has actually no relevant content to the client request. Since such redirecti...
Taehyung Lee, Jinil Kim, Jin Wook Kim, Sung-Ryul K...
WWW
2011
ACM
13 years 1 months ago
we.b: the web of short urls
Short URLs have become ubiquitous. Especially popular within social networking services, short URLs have seen a significant increase in their usage over the past years, mostly du...
Demetres Antoniades, Iasonas Polakis, Georgios Kon...
WWW
2004
ACM
14 years 7 months ago
Combining link and content analysis to estimate semantic similarity
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic ass...
Filippo Menczer
SAC
2005
ACM
14 years 15 days ago
A distributed content-based search engine based on mobile code
Current search engines crawl the Web, download content, and digest this content locally. For multimedia content, this involves considerable volumes of data. Furthermore, this proc...
Volker Roth, Ulrich Pinsdorf, Jan Peters