Sciweavers

708 search results - page 6 / 142
» Identifying Content Blocks from Web Documents
Sort
View
WWW
2005
ACM
14 years 8 months ago
Thresher: automating the unwrapping of semantic content from the World Wide Web
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Andrew Hogue, David R. Karger
DOCENG
2010
ACM
13 years 6 months ago
From templates to schemas: bridging the gap between free editing and safe data processing
In this paper we present tools that provide an easy way to edit XML content directly on the web, with the usual benefit of valid XML content. These tools make it possible to crea...
Vincent Quint, Cécile Roisin, Stépha...
SIGIR
2008
ACM
13 years 7 months ago
Comments-oriented document summarization: understanding documents with readers' feedback
Comments left by readers on Web documents contain valuable information that can be utilized in different information retrieval tasks including document search, visualization, and ...
Meishan Hu, Aixin Sun, Ee-Peng Lim
WWW
2007
ACM
14 years 8 months ago
Homepage live: automatic block tracing for web personalization
The emergence of personalized homepage services, e.g. personalized Google Homepage and Microsoft Windows Live, has enabled Web users to select Web contents of interest and to aggr...
Jie Han, Dingyi Han, Chenxi Lin, Hua-Jun Zeng, Zhe...
WWW
2008
ACM
14 years 8 months ago
Genealogical trees on the web: a search engine user perspective
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using al...
Ricardo A. Baeza-Yates, Álvaro R. Pereira J...