The volume and evolution of web page templates

16 years 7 months ago

Download research.yahoo.com

Web pages contain a combination of unique content and template material, which is present across multiple pages and used primarily for formatting, navigation, and branding. We study the nature, evolution, and prevalence of these templates on the web. As part of this work, we develop new randomized algorithms for template extraction that perform approximately twenty times faster than existing approaches with similar quality. Our results show that 40?50% of the content on the web is template content. Over the last eight years, the fraction of template content has doubled, and the growth shows no sign of abating. Text, links, and total HTML bytes within templates are all growing as a fraction of total content at a rate of between 6 and 8% per year. We discuss the deleterious implications of this growth for information retrieval and ranking, classification, and link analysis. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval; F....

David Gibson, Kunal Punera, Andrew Tomkins

Real-time Traffic

General Terms Algorithms | Internet Technology | Nonnumerical Algorithms | Template Content | WWW 2005 |

claim paper

» Archiving the web using page changes patterns a case study

» Employing Inductive Databases in Concrete Applications

» Evolving the Semantic Web with Mangrove

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2005
Where	WWW
Authors	David Gibson, Kunal Punera, Andrew Tomkins

Comments (0)

Sciweavers

The volume and evolution of web page templates

General Terms Algorithms | Internet Technology | Nonnumerical Algorithms | Template Content | WWW 2005 |

Explore & Download

Productivity Tools

Sciweavers