Sciweavers

72 search results - page 8 / 15
» Ontology-Focused Crawling of Web Documents
Sort
View
WWW
2009
ACM
14 years 8 months ago
User-centric content freshness metrics for search engines
In order to return relevant search results, a search engine must keep its local repository synchronized to the Web, but it is usually impossible to attain perfect freshness. Hence...
Ali Dasdan, Xinh Huynh
ADAPTIVE
2007
Springer
14 years 2 months ago
Web Document Modeling
A very common issue of adaptive Web-Based systems is the modeling of documents. Such documents represent domain-specific information for a number of purposes. Application areas su...
Alessandro Micarelli, Filippo Sciarrone, Mauro Mar...
PVLDB
2008
141views more  PVLDB 2008»
13 years 7 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
WWW
2003
ACM
14 years 8 months ago
Dynamic maintenance of web indexes using landmarks
Recent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchron...
Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey...
CLEF
2005
Springer
14 years 1 months ago
EuroGOV: Engineering a Multilingual Web Corpus
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...
Börkur Sigurbjörnsson, Jaap Kamps, Maart...