Sciweavers

103 search results - page 13 / 21
» Models and Algorithms for Duplicate Document Detection
Sort
View
WWW
2004
ACM
14 years 8 months ago
Automatic detection of fragments in dynamically generated web pages
Dividing web pages into fragments has been shown to provide significant benefits for both content generation and caching. In order for a web site to use fragment-based content gen...
Lakshmish Ramaswamy, Arun Iyengar, Ling Liu, Fred ...
WSDM
2010
ACM
215views Data Mining» more  WSDM 2010»
14 years 4 months ago
Boilerplate Detection using Shallow Text Features
In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...
Christian Kohlschütter, Peter Fankhauser, Wol...
CIKM
2006
Springer
13 years 11 months ago
Improving novelty detection for general topics using sentence level information patterns
The detection of new information in a document stream is an important component of many potential applications. In this work, a new novelty detection approach based on the identif...
Xiaoyan Li, W. Bruce Croft
CIKM
2009
Springer
14 years 1 months ago
Cross-language linking of news stories on the web using interlingual topic modelling
We have studied the problem of linking event information across different languages without the use of translation systems or dictionaries. The linking is based on interlingua in...
Wim De Smet, Marie-Francine Moens
ERCIMDL
2000
Springer
88views Education» more  ERCIMDL 2000»
13 years 11 months ago
Modeling Archival Repositories for Digital Libraries
This paper studies the archival problem: how a digital library can preserve electronic documents over long periods of time. We analyze how an archival repository can fail and we p...
Arturo Crespo, Hector Garcia-Molina