Sciweavers

72 search results - page 14 / 15
» Ontology-Focused Crawling of Web Documents
Sort
View
JCDL
2004
ACM
128views Education» more  JCDL 2004»
14 years 1 months ago
Panorama: extending digital libraries with topical crawlers
A large amount of research, technical and professional documents are available today in digital formats. Digital libraries are created to facilitate search and retrieval of inform...
Gautam Pant, Kostas Tsioutsiouliklis, Judy Johnson...
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 8 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
WIKIS
2006
ACM
14 years 1 months ago
SweetWiki: semantic web enabled technologies in Wiki
Wikis are social web sites enabling a potentially large number of participants to modify any page or create a new page using their web browser. As they grow, wikis may suffer from...
Michel Buffa, Fabien Gandon
SIGIR
2008
ACM
13 years 7 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
APWEB
2009
Springer
13 years 12 months ago
Ontology Evaluation through Text Classification
We present a new method to evaluate a search ontology, which relies on mapping ontology instances to textual documents. On the basis of this mapping, we evaluate the adequacy of on...
Yael Dahan Netzer, David Gabay, Meni Adler, Yoav G...