Sciweavers

ICMLA
2008

Highly Scalable SVM Modeling with Random Granulation for Spam Sender Detection

14 years 1 months ago
Highly Scalable SVM Modeling with Random Granulation for Spam Sender Detection
Spam sender detection based on email subject data is a complex large-scale text mining task. The dataset consists of email subject lines and the corresponding IP address of the email sender. A fast and accurate classifier is desirable in such an application. In this research, a highly scalable SVM modeling method, named Granular SVM with Random granulation (GSVM-RAND), is designed. GSVM-RAND applies bootstrapping to extract a number of subsets of samples from the original training dataset. Each training subset is then projected into a feature subspace randomly selected from the original feature space. Here we call a granule such a subset of samples in such a feature subspace. A local SVM is then modeled in each granule. For a new sample, it is firstly projected into each granule in which the local SVM is fired to make a prediction. After that, all SVM predictions are aggregated by Bayesian Sum Rule for a final decision. GSVM-RAND is easy to be parallelized and hence efficient and high...
Yuchun Tang, Yuanchen He, Sven Krasser
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where ICMLA
Authors Yuchun Tang, Yuanchen He, Sven Krasser
Comments (0)