We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
In this paper we present tools that provide an easy way to edit XML content directly on the web, with the usual benefit of valid XML content. These tools make it possible to crea...
Comments left by readers on Web documents contain valuable information that can be utilized in different information retrieval tasks including document search, visualization, and ...
The emergence of personalized homepage services, e.g. personalized Google Homepage and Microsoft Windows Live, has enabled Web users to select Web contents of interest and to aggr...
Jie Han, Dingyi Han, Chenxi Lin, Hua-Jun Zeng, Zhe...
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using al...