Sciweavers

38 search results - page 6 / 8
» The indexable web is more than 11.5 billion pages
Sort
View
CN
1999
242views more  CN 1999»
13 years 10 months ago
Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource d...
Soumen Chakrabarti, Martin van den Berg, Byron Dom
SIGIR
2008
ACM
13 years 10 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
CLEF
2005
Springer
14 years 3 months ago
EuroGOV: Engineering a Multilingual Web Corpus
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...
Börkur Sigurbjörnsson, Jaap Kamps, Maart...
TREC
2004
13 years 11 months ago
Experiments with Web QA System and TREC 2004 Questions
We describe our first participation in TREC. We only competed in the Question Answering (QA) category and limited our runs to factoids. Our approach was to use our open domain QA ...
Dmitri Roussinov, Yin Ding, Jose Antonio Robles-Fl...
WWW
2005
ACM
14 years 11 months ago
The semantic webscape: a view of the semantic web
It has been a few years since the semantic Web was initiated by W3C, but its status has not been quantitatively measured. It is crucial to understand the status at this early stag...
Juhnyoung Lee, Richard Goodwin