Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels

8 years 11 months ago

Download www.cs.cmu.edu

We examine the problem of aggregating the results of multiple anti-virus (AV) vendors’ detectors into a single authoritative ground-truth label for every binary. To do so, we adapt a well-known generative Bayesian model that postulates the existence of a hidden ground truth upon which the AV labels depend. We use training based on Expectation Maximization for this fully unsupervised technique. We evaluate our method using 279,327 distinct binaries from VirusTotal, each of which appeared for the ﬁrst time between January 2012 and June 2014. Our evaluation shows that our statistical model is consistently more accurate at predicting the future-derived ground truth than all unweighted rules of the form “k out of n” AV detections. In addition, we evaluate the scenario where partial ground truth is available for model building. We train a logistic regression predictor on the partial label information. Our results show that as few as a 100 randomly selected training instances with gr...

Alex Kantchelian, Michael Carl Tschantz, Sadia Afr

Real-time Traffic

CCS 2015 | Security Privacy |

claim paper

Post Info
More Details (n/a)

Added	17 Apr 2016
Updated	17 Apr 2016
Type	Journal
Year	2015
Where	CCS
Authors	Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Brad Miller, Vaishaal Shankar, Rekha Bachwani, Anthony D. Joseph, J. Doug Tygar

Comments (0)

Sciweavers

Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels

CCS 2015 | Security Privacy |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers