Sciweavers

308 search results - page 15 / 62
» Syntactic Similarity of Web Documents
Sort
View
DL
2000
Springer
156views Digital Library» more  DL 2000»
14 years 9 days ago
Re-engineering structures from Web documents
To realise a wide range of applications (including digital libraries) on the Web, a more structured way of accessing the Web is required and such requirement can be facilitated by...
Chuang-Hue Moh, Ee-Peng Lim, Wee Keong Ng
ERCIMDL
2006
Springer
124views Education» more  ERCIMDL 2006»
13 years 11 months ago
Design and Selection Criteria for a National Web Archive
Web archives and Digital Libraries are conceptually similar, as they both store and provide access to digital contents. The process of loading documents into a Digital Library usua...
Daniel Gomes, Sérgio Freitas, Mário ...
WWW
2007
ACM
14 years 8 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
VLDB
2003
ACM
125views Database» more  VLDB 2003»
14 years 8 months ago
THESUS: Organizing Web document collections based on link semantics
Abstract. The requirements for effective search and management of the WWW are stronger than ever. Currently Web documents are classified based on their content not taking into acco...
Maria Halkidi, Benjamin Nguyen, Iraklis Varlamis, ...
WEBNET
2001
13 years 9 months ago
Deriving Context Specific Information on the Web
: The Web is huge, unstructured and diverse in quality, which makes searching for information difficult. In practice, few of the documents returned by a search engine are valuable ...
Christo Dichev, Darina Dicheva