Sciweavers

708 search results - page 15 / 142
» Identifying Content Blocks from Web Documents
Sort
View
ADC
2009
Springer
113views Database» more  ADC 2009»
14 years 3 months ago
Ranking-Constrained Keyword Sequence Extraction from Web Documents
Given a large volume of Web documents, we consider problem of finding the shortest keyword sequences for each of the documents such that a keyword sequence can be rendered to a g...
Ding-Yi Chen, Xue Li, Jing Liu, Xia Chen
KDD
2006
ACM
185views Data Mining» more  KDD 2006»
14 years 9 months ago
Understanding Content Reuse on the Web: Static and Dynamic Analyses
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...
Ricardo A. Baeza-Yates, Álvaro R. Pereira J...
WIDM
2004
ACM
14 years 2 months ago
Stylistic and lexical co-training for web block classification
Many applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classifica...
Chee How Lee, Min-Yen Kan, Sandra Lai
DOCENG
2008
ACM
13 years 10 months ago
Interactive office documents: a new face for web 2.0 applications
As the world wide web transforms from a vehicle of information dissemination and e-commerce transactions into a writable nexus of human collaboration, the Web 2.0 technologies at ...
John M. Boyer
ERCIMDL
2006
Springer
124views Education» more  ERCIMDL 2006»
14 years 7 days ago
Design and Selection Criteria for a National Web Archive
Web archives and Digital Libraries are conceptually similar, as they both store and provide access to digital contents. The process of loading documents into a Digital Library usua...
Daniel Gomes, Sérgio Freitas, Mário ...