Search Sciweavers | Sciweavers

48 search results - page 6 / 10

» Language Based Crawling: Crawling the Arabic Content of the ...

click to vote

CLEF
2005
Springer

115views Information Technology» more CLEF 2005»

EuroGOV: Engineering a Multilingual Web Corpus

14 years 15 days ago

Download www.clef-campaign.org

EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...

Börkur Sigurbjörnsson, Jaap Kamps, Maart...

claim paper

Read More »

click to vote

WWW
2009
ACM

125views Internet Technology» more WWW 2009»

Triplify: light-weight linked data publication from relational databases

14 years 7 months ago

Download www.informatik.uni-leipzig.de

In this paper we present Triplify ? a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relat...

Sören Auer, Sebastian Dietzold, Jens Lehmann,...

claim paper

Read More »

click to vote

WISE
2005
Springer

204views Internet Technology» more WISE 2005»

Temporal Ranking of Search Engine Results

14 years 16 days ago

Download www.dl.kuis.kyoto-u.ac.jp

Existing search engines contain the picture of the Web from the past and their ranking algorithms are based on data crawled some time ago. However, a user requires not only relevan...

Adam Jatowt, Yukiko Kawai, Katsumi Tanaka

claim paper

Read More »

click to vote

WIDM
2003
ACM

130views Internet Technology» more WIDM 2003»

Datarover: a taxonomy based crawler for automated data extraction from data-intensive websites

14 years 6 days ago

Download www.public.asu.edu

The advent of e-commerce has created a trend that brought thousands of catalogs online. Most of these websites are “taxonomy-directed”. A Web site is said to be ``taxonomydire...

Hasan Davulcu, S. Koduri, Saravanakumar Nagarajan

claim paper

Read More »

click to vote

KDD
2008
ACM

183views Data Mining» more KDD 2008»

De-duping URLs via rewrite rules

14 years 7 months ago

Download research.yahoo.com

A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...

Anirban Dasgupta, Ravi Kumar, Amit Sasturkar

claim paper

Read More »

« Prev « First page 6 / 10 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers