Highly Scalable SVM Modeling with Random Granulation for Spam Sender Detection

14 years 2 months ago

Download www.trustedsource.org

Spam sender detection based on email subject data is a complex large-scale text mining task. The dataset consists of email subject lines and the corresponding IP address of the email sender. A fast and accurate classifier is desirable in such an application. In this research, a highly scalable SVM modeling method, named Granular SVM with Random granulation (GSVM-RAND), is designed. GSVM-RAND applies bootstrapping to extract a number of subsets of samples from the original training dataset. Each training subset is then projected into a feature subspace randomly selected from the original feature space. Here we call a granule such a subset of samples in such a feature subspace. A local SVM is then modeled in each granule. For a new sample, it is firstly projected into each granule in which the local SVM is fired to make a prediction. After that, all SVM predictions are aggregated by Bayesian Sum Rule for a final decision. GSVM-RAND is easy to be parallelized and hence efficient and high...

Yuchun Tang, Yuanchen He, Sven Krasser

Real-time Traffic

Email Subject | ICMLA 2008 | Local Svms | Machine Learning | Scalable Svm Modeling |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	ICMLA
Authors	Yuchun Tang, Yuanchen He, Sven Krasser

Comments (0)

Sciweavers

Highly Scalable SVM Modeling with Random Granulation for Spam Sender Detection

Email Subject | ICMLA 2008 | Local Svms | Machine Learning | Scalable Svm Modeling |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers