Sciweavers

WWW
2007
ACM

Page-level template detection via isotonic smoothing

15 years 3 days ago
Page-level template detection via isotonic smoothing
We develop a novel framework for the page-level template detection problem. Our framework is built on two main ideas. The first is the automatic generation of training data for a classifier that, given a page, assigns a templateness score to every DOM node of the page. The second is the global smoothing of these per-node classifier scores by solving a regularized isotonic regression problem; the latter folm a simple yet powerful abstraction of templateness on a page. Our extensive experiments on human-labeled test data show that our approach detects templates effectively. Categories and Subject Descriptors H.4.m [Information Systems]: Miscellaneous General Terms Algorithms, Experimentation, Measurements Keywords Webpage sectioning, webpage segmentation, template detection, isotonic regression
Deepayan Chakrabarti, Ravi Kumar, Kunal Punera
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2007
Where WWW
Authors Deepayan Chakrabarti, Ravi Kumar, Kunal Punera
Comments (0)