Lara Dedic

Deloitte

and

Matthew Teschke

Novetta

CNN-Based Malware Visualization and Explainability (pdf)

Manually determining the malware-like characteristics of an executable using signature and behavioral based identifiers has become difficult and laborious for domain experts as malware becomes more complex. Using machine learning models to automatically detect important features in malware by taking advantage of advancements in deep learning, such as image classification, has developed into a research topic that both interests malware reverse engineers and data scientists.

This work is an expansion of recent attempts to better interpret convolutional neural networks (CNNs) that have been trained on image representations of malware through the network’s activations. We present a reproducible approach to visually explain a CNN’s predictions by overlaying heatmaps on top of disassembled malware that’s been transformed into images, and then show how it can be used as an automated malware analysis tool for reverse engineers as a way to navigate through a complex piece of malware for the first time. We use fastai, a deep learning library that simplifies training state of the art neural networks for any task including malware binary classification, and Gradient-weighted Class Activation Mappings (Grad-CAM) to generate the heatmaps over regions in the image that might indicate malicious behavior.