On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Abstract. A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and auto...
The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents ...
Eric J. Glover, Kostas Tsioutsiouliklis, Steve Law...
SGML standardized in ISO 8879 [International Organization for Standardization (1986)] has been proliferated because it can provide various styles and transform documents on dieren...
Abstract. To fight the problem of information overload in huge information sources like large document repositories, e. g. citeseer, or internet websites you need a selection crit...