We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and ...
This paper describes a new research proposal of multi-document summarization of dynamic content in web pages. Much information is lost in the Web due to the temporal character of w...
Existing approaches on privacy-preserving data publishing rely on the assumption that data can be divided into quasi-identifier attributes (QI) and sensitive attribute (SA). This ...
Ada Wai-Chee Fu, Ke Wang, Raymond Chi-Wing Wong, Y...
In this paper, we present a novel method for the classification of Web sites. This method exploits both structure and content of Web sites in order to discern their functionality....
Web pages contain a combination of unique content and template material, which is present across multiple pages and used primarily for formatting, navigation, and branding. We stu...