Sciweavers

WWW
2004
ACM

Learning block importance models for web pages

15 years 1 months ago
Learning block importance models for web pages
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is proved that differentiating noisy or unimportant blocks from pages can facilitate web mining, search and accessibility. But in these works, no uniform approach or model is presented to measure the importance of different portions in web pages. Through a user study, we found that people do have a consistent view about the importance of blocks in web pages. In this paper, we investigate how to find a model to automatically assign importance values to blocks in a web page. We define the block importance estimation as a learning problem. First, we use the VIPS (VIsion-based Page Segmentation) algorithm to partition a web page into semantic blocks with a hierarchy structure. Then spatial features (such as position, size) and content features (such as the number of images and links) are extracted to construct a featur...
Ruihua Song, Haifeng Liu, Ji-Rong Wen, Wei-Ying Ma
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2004
Where WWW
Authors Ruihua Song, Haifeng Liu, Ji-Rong Wen, Wei-Ying Ma
Comments (0)