Trash article detection using categorization techniques

15 years 4 months ago

Download ru6.cti.gr

We explore techniques for detecting news articles containing invalid information, using the help of text categorization technology. The information that exists on the World Wide Web is huge enough in order to distract the users when trying to find useful information. In order to overcome the large amounts of data many methodologies of text categorization have been presented. One major problem we have to deal with is that many articles fetched by a crawler, then stored in a back-end database, and finally given as an input to a categorization subsystem, may not contain valid information for the user (trashy articles). This may lead to the user losing his trust towards the system. In this paper, we analyze the special properties of trashy news articles' categorization that allows us to detect them and we propose a specific methodology for trash detection. Finally, we evaluate the proposed algorithm on a news categorization system and we depict the overall benefit of a trash detectio...

Christos Bouras, Vassilis Tsogkas, Vassilis Poulop

Real-time Traffic

Categorization | IADIS 2009 | Internet Technology | Text Categorization | Trash Detection |

claim paper

» Categorizing Vulnerabilities Using Data Clustering Techniques

» Efficient overlap and content reuse detection in blogs and online news articles

» Users can change their web search tactics Design guidelines for categorized overviews

» Detection and Visualization of Anomalous Structures in Molecular Dynamics Simulation Data

» Associating Faces and Names in Japanese Photo News Articles on the Web

» A Framework for Exploring Categorical Data

» Detection of Interdomain Routing Anomalies Based on HigherOrder Path Analysis

» Categorizations and Annotations of Citation in Research Evaluation

Post Info
More Details (n/a)

Added	18 Feb 2011
Updated	18 Feb 2011
Type	Journal
Year	2009
Where	IADIS
Authors	Christos Bouras, Vassilis Tsogkas, Vassilis Poulopoulos, George Tsichritzis

Comments (0)

Sciweavers

Trash article detection using categorization techniques

Categorization | IADIS 2009 | Internet Technology | Text Categorization | Trash Detection |

Explore & Download

Productivity Tools

Sciweavers