Dynamically Weighted Hidden Markov Model for Spam Deobfuscation

15 years 9 months ago

Download www.ijcai.org

Spam deobfuscation is a processing to detect obfuscated words appeared in spam emails and to convert them back to the original words for correct recognition. Lexicon tree hidden Markov model (LTHMM) was recently shown to be useful in spam deobfuscation. However, LT-HMM suffers from a huge number of states, which is not desirable for practical applications. In this paper we present a complexity-reduced HMM, referred to as dynamically weighted HMM (DW-HMM) where the states involving the same emission probability are grouped into super-states, while preserving state transition probabilities of the original HMM. DWHMM dramatically reduces the number of states and its state transition probabilities are determined in the decoding phase. We illustrate how we convert a LT-HMM to its associated DW-HMM. We conﬁrm the useful behavior of DW-HMM in the task of spam deobfuscation, showing that it signiﬁcantly reduces the number of states while maintaining the high accuracy.

Seunghak Lee, Iryoung Jeong, Seungjin Choi

Real-time Traffic

Artificial Intelligence | Hidden Markov Model | IJCAI 2007 | Spam Deobfuscation | State Transition Probabilities |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	IJCAI
Authors	Seunghak Lee, Iryoung Jeong, Seungjin Choi

Comments (0)

Sciweavers

Dynamically Weighted Hidden Markov Model for Spam Deobfuscation

Artificial Intelligence | Hidden Markov Model | IJCAI 2007 | Spam Deobfuscation | State Transition Probabilities |

Explore & Download

Productivity Tools

Sciweavers