

A Layout-Independent Web News Article Contents Extraction Method Based on Relevance Analysis

14 years 8 months ago
A Layout-Independent Web News Article Contents Extraction Method Based on Relevance Analysis
Abstract. The traditional Web news article contents extraction methods are time-costly and need much maintenance because they analyze the layout of news pages to generate the wrappers manually or automatically. In this paper, we propose a relevance-based analysis method to extract the news article contents from the news pages without the analysis of news page layouts before extraction. This method is applicable to the general news pages and we give the implementations of news extraction from different kinds of news sources.
Hao Han, Takehiro Tokuda
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where ICWE
Authors Hao Han, Takehiro Tokuda
Comments (0)