Sciweavers

WWW
2003
ACM

Improving pseudo-relevance feedback in web information retrieval using web page segmentation

15 years 7 days ago
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-based Page Segmentation (VIPS) algorithm to detect the semantic content structure in a web page. Compared with simple DOM based segmentation method, our page segmentation scheme utilizes useful visual cues to obtain a better partition of a page at the semantic level. By using our VIPS algorithm to assist the selection of query expansion terms in pseudo-relevance feedback in web information retrieval, we achieve 27% performance improvement on Web Track dataset. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval ? Relevance feedback; H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia General Terms Algorithms, Performance, Human ...
Shipeng Yu, Deng Cai, Ji-Rong Wen, Wei-Ying Ma
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2003
Where WWW
Authors Shipeng Yu, Deng Cai, Ji-Rong Wen, Wei-Ying Ma
Comments (0)