Giorgio Serveri

Northeastern University,

Jim Meyer

FireEye,

and

Scott Coull

FireEye

Exploring Backdoor Poisoning Attacks Against Malware Classifiers (pdf, video)

Antivirus vendors often rely on crowdsourced threat feeds, such as VirusTotal and ReversingLabs, to provide them with a large, diverse stream of data to train their malware classifiers. Since these threat feeds are largely built around user-submitted binaries, they provide an ideal vector for poisoning attacks, where an attacker injects manipulated samples into the classifier’s training data in an effort to cause misclassifications after deployment. In a backdoor poisoning attack, the attacker places a carefully chosen watermark into the feature space such that the classifier learns to associate its presence with a class of the attacker’s choosing. These backdoor attacks have been proven extremely effective against image classifiers without requiring a large number of poisoned examples, but their applicability to the malware classification domain remains uncertain.

In this talk, we explore the application of backdoor poisoning to malware classification through the development of novel, model-agnostic attacks in the white box setting that leverage tools from the area of model interpretability, namely SHapley Additive exPlanations (SHAP). Intuitively, our attack uses the SHAP values for the features as a proxy for how close certain values are to the decision boundary of the classifier, and consequently how easily we can manipulate them to embed our watermark. At the same time, we balance the ease of manipulation against our desire to blend in with surrounding (non-poisoned) samples, ensuring that we use watermarks that are consistent with the remainder of the dataset. Unlike previous work on backdoor attacks against image classifiers, which focus solely on deep neural networks, our techniques can operate on any model where SHAP values can be approximated for the underlying feature space. Moreover, we adapt the threat model developed in the image classification space to more accurately reflect the realities of malware classification so that we can evaluate the efficacy of our attack as a function of the attacker’s knowledge and capabilities in manipulating the feature space.

The results of our experiments on the EMBER dataset highlight the effectiveness of our backdoor attack, demonstrating high evasion rates with a training set containing a small proportion of poisoned examples. Even in the more extreme attack settings, these poisoned examples did not significantly impact the baseline performance of the classifier. In addition, we explored several common anomaly detection and dataset cleansing techniques to better understand useful mitigation strategies that antivirus vendors might use against our attack. Taken together, the results of our experiments validate the effectiveness of our model-agnostic backdoor poisoning attacks and bring to light a potential threat that antivirus vendors face when using crowdsourced threat feeds for training machine learning models.