Sciweavers

84 search results - page 1 / 17
» Managing duplicates in a web archive
Sort
View
SAC
2006
ACM
14 years 4 months ago
Managing duplicates in a web archive
Daniel Gomes, André L. Santos, Mário...
SIGIR
2008
ACM
13 years 10 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
WWW
2006
ACM
14 years 11 months ago
Archiving web site resources: a records management view
In this paper, we propose the use of records management principles to identify and manage Web site resources with enduring value as records. Current Web archiving activities, coll...
Maureen Pennock, Brian Kelly
ELPUB
2007
ACM
14 years 2 months ago
Digitisation and Access to Archival Collections: A Case Study of the Sofia Municipal Government (1878-1879)
The paper presents in brief a project aimed at the development of a methodology and corresponding software tools intended for building of proper environments giving up means for s...
Maria Nisheva-Pavlova, Pavel Pavlov, Nikolay Marko...