: For the TREC 2007 conference, the CRM114 team considered three nonBayesian methods of spam filtration in the CRM114 framework – an SVM based on the “hyperspace” feature==document paradigm, a bitentropy matcher, and substring compression based on LZ77. As a calibration yardstick, we used the welltested and widely used CRM114 OSB markov random field system (basically unchanged since 2003). The results show that the SVM has a spamfiltering accuracy of about a factor of two to three better accuracy than the OSB system, that substring compression is somewhat more accurate than OSB, and that bit entropy is somewhat less accurate for the TREC 2007 test sets.
Mamoru Kato, Joseph Langeway, Yimin Wu, William S.