Sciweavers

SIGIR
2008
ACM

Classifiers without borders: incorporating fielded text from neighboring web pages

13 years 11 months ago
Classifiers without borders: incorporating fielded text from neighboring web pages
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby pages to improve performance. In contrast, in this work we utilize a weighted combination of the contents of neighbors to generate a better virtual document for classification. In addition, we break pages into fields, finding that a weighted combination of text from the target and fields of neighboring pages is able to reduce classification error by more than a third. We demonstrate performance on a large dataset of pages from the Open Directory Project and validate the approach using pages from a crawl from the Stanford WebBase. Interestingly, we find no value in anchor text and unexpected value in page titles (and especially titles of parent pages) in the virtual document. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval--Information filtering; I....
Xiaoguang Qi, Brian D. Davison
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2008
Where SIGIR
Authors Xiaoguang Qi, Brian D. Davison
Comments (0)