Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

151

WWW
2009
ACM

147views Internet Technology» more WWW 2009»

A densitometric analysis of web template content

16 years 7 months ago

A densitometric analysis of web template content

Download www2009.eprints.org

What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statistical laws from the field of Quantitative Linguistics. I analyze the idiosyncrasy of template content compared to regular "full text" content and derive a simple yet suitable quantitative model. Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval; G.3 [Probability and Statistics]: Distribution Functions General Terms Theory, Experimentation, Measurement Keywords Content Analysis, Template Detection, Template Removal, Web Page Segmentation, Noise Removal

Christian Kohlschütter

Real-time Traffic

Internet Technology | Keywords Content Analysis | Template Content | Textual Web Content | WWW 2009 |

claim paper

Related Content

» From templates to schemas bridging the gap between free editing and safe data processing

» The volume and evolution of web page templates

» Template detection for large scale search engines

» Web document text and images extraction using DOM analysis and natural language processing

» PrintMonkey giving users a grip on printing the web

» Archiving the web using page changes patterns a case study

» An exploratory mapping strategy for webdriven magazines

» An information extraction engine for web discussion forums

» Web article extraction for web printing a DOMvisual based approach

Post Info
More Details (n/a)

Added	21 Nov 2009
Updated	21 Nov 2009
Type	Conference
Year	2009
Where	WWW
Authors	Christian Kohlschütter

Comments (0)