Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

222

AAAI
2007

135views Intelligent Agents» more AAAI 2007»

Template-Independent News Extraction Based on Visual Consistency

15 years 10 months ago

Template-Independent News Extraction Based on Visual Consistency

Download www.cse.psu.edu

Wrapper is a traditional method to extract useful information from Web pages. Most previous works rely on the similarity between HTML tag trees and induced template-dependent wrappers. When hundreds of information sources need to be extracted in a speciﬁc domain like news, it is costly to generate and maintain the wrappers. In this paper, we propose a novel templateindependent news extraction approach to easily identify news articles based on visual consistency. We ﬁrst represent a page as a visual block tree. Then, by extracting a series of visual features, we can derive a composite visual feature set that is stable in the news domain. Finally, we use a machine learning approach to generate a template-independent wrapper. Experimental results indicate that our approach is effective in extracting news across websites, even from unseen websites. The performance is as high as around 95% in terms of F1-value.

Shuyi Zheng, Ruihua Song, Ji-Rong Wen

Real-time Traffic

AAAI 2007 | Composite Visual Feature | Intelligent Agents | Visual Block Tree | Visual Feature |

claim paper

Related Content

» Automatic textual annotation of video news based on semantic visual object extraction

» An evaluation system for news video streams and blogs

» A semantic webbased approach for personalizing news

» Event Recognition from News Webpages through Latent Ingredients Extraction

» CoClustering of TimeEvolving News Story with Transcript and Keyframe

» Comparison of Visual Features and Fusion Techniques in Automatic Detection of Concepts fro...

» Topic Tracking Across Broadcast News Videos with Visual Duplicates and Semantic Concepts

» Detection of unique people in news programs using multimodal shot clustering

» Diamonds in the rough Social media visual analytics for journalistic inquiry

Post Info
More Details (n/a)

Added	02 Oct 2010
Updated	02 Oct 2010
Type	Conference
Year	2007
Where	AAAI
Authors	Shuyi Zheng, Ruihua Song, Ji-Rong Wen

Comments (0)