Embedded malware is a recently discovered security threat that allows malcode to be hidden inside a benign file. It has been shown that embedded malware is not detected by commercial antivirus software even when the malware signature is present in the antivirus database. In this paper, we present a novel anomaly detection scheme to detect embedded malware. We first analyze byte sequences in benign files to show that benign files' data generally exhibit a 1-st order dependence structure. Consequently, conditional n-grams provide a more meaningful representation of a file's statistical properties than traditional n-grams. To capture and leverage this correlation structure for embedded malware detection, we model the conditional distributions as Markov n-grams. For embedded malware detection, we use an information-theoretic measure, called entropy rate, to quantify changes in Markov n-gram distributions observed in a file. We show that the entropy rate of Markov n-grams gets sig...
M. Zubair Shafiq, Syed Ali Khayam, Muddassar Faroo