Sciweavers

WWW
2008
ACM

A graph-theoretic approach to webpage segmentation

15 years 1 months ago
A graph-theoretic approach to webpage segmentation
We consider the problem of segmenting a webpage into visually and semantically cohesive pieces. Our approach is based on formulating an appropriate optimization problem on weighted graphs, where the weights capture if two nodes in the DOM tree should be placed together or apart in the segmentation; we present a learning framework to learn these weights from manually labeled data in a principled manner. Our work is a significant departure from previous heuristic and rule-based solutions to the segmentation problem. The results of our empirical analysis bring out interesting aspects of our framework, including variants of the optimization problem and the role of learning. Categories and Subject Descriptors H.3.m [Information Systems]: Information Storage and Retrieval General Terms Algorithms, Experimentation Keywords Webpage sectioning, webpage segmentation, energy minimization, graph cuts, correlation clustering
Deepayan Chakrabarti, Ravi Kumar, Kunal Punera
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2008
Where WWW
Authors Deepayan Chakrabarti, Ravi Kumar, Kunal Punera
Comments (0)