Sciweavers

708 search results - page 42 / 142
» Identifying Content Blocks from Web Documents
Sort
View
IJCAI
2003
13 years 10 months ago
Predicting Web Information Content
In this paper, we propose a novel method to infer the web user’s Information Content (IC), which is the information that the user must examine to complete her task. In particula...
Tingshao Zhu, Russell Greiner, Gerald Häubl, ...
WWW
2007
ACM
14 years 9 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
WWW
2008
ACM
14 years 9 months ago
As we may perceive: finding the boundaries of compound documents on the web
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Pavel Dmitriev
WWW
2001
ACM
14 years 9 months ago
Towards second and third generation web-based multimedia
First generation Web-content encodes information in handwritten (HTML) Web pages. Second generation Web content generates HTML pages on demand, e.g. by filling in templates with c...
Jacco van Ossenbruggen, Joost Geurts, Frank Cornel...
DOCENG
2007
ACM
14 years 18 days ago
Extracting reusable document components for variable data printing
Variable Data Printing (VDP) has brought new flexibility and dynamism to the printed page. Each printed instance of a specific class of document can now have different degrees of ...
Steven R. Bagley, David F. Brailsford, James A. Ol...