Word Stemming to Enhance Spam Filtering

14 years 5 months ago

Download ceas.cc

Generally a content based spam filter works on words and phrases of email text and if it finds offensive content it gives that email a numerical value (depending on the content). After crossing a certain threshold, that email may be considered as SPAM. This technique works well only if the offensive words are lexically correct. That means the words must be valid words with correct spelling. Otherwise most content based spam filters will be unable to detect offensive words. In this paper, we showed that if we use some sort of word stemming or word hashing technique that can extract the base or stem of a misspelled or modified word, the efficiency of any content based spam filter can be significantly improved. Here we presented a simple rule -based word stemming algorithm specifically designed for spam detection and showed some experimental results to corroborate our claim.

Shabbir Ahmed, Farzana Mithun

Real-time Traffic

CEAS 2004 | Content Based Spam | Offensive Words | Spam Filter |

claim paper

Post Info
More Details (n/a)

Added	01 Jul 2010
Updated	01 Jul 2010
Type	Conference
Year	2004
Where	CEAS
Authors	Shabbir Ahmed, Farzana Mithun

Comments (0)

Sciweavers

Word Stemming to Enhance Spam Filtering

CEAS 2004 | Content Based Spam | Offensive Words | Spam Filter |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers