Sciweavers

IADIS
2009

Trash article detection using categorization techniques

13 years 9 months ago
Trash article detection using categorization techniques
We explore techniques for detecting news articles containing invalid information, using the help of text categorization technology. The information that exists on the World Wide Web is huge enough in order to distract the users when trying to find useful information. In order to overcome the large amounts of data many methodologies of text categorization have been presented. One major problem we have to deal with is that many articles fetched by a crawler, then stored in a back-end database, and finally given as an input to a categorization subsystem, may not contain valid information for the user (trashy articles). This may lead to the user losing his trust towards the system. In this paper, we analyze the special properties of trashy news articles' categorization that allows us to detect them and we propose a specific methodology for trash detection. Finally, we evaluate the proposed algorithm on a news categorization system and we depict the overall benefit of a trash detectio...
Christos Bouras, Vassilis Tsogkas, Vassilis Poulop
Added 18 Feb 2011
Updated 18 Feb 2011
Type Journal
Year 2009
Where IADIS
Authors Christos Bouras, Vassilis Tsogkas, Vassilis Poulopoulos, George Tsichritzis
Comments (0)