Tamás Vörös
Sophos,
Rich Harang
Duo,
Josh Saxe
Sophos,
and
Konstantin Berlin
Sophos
Bad neighborhoods – learning malicious infrastructure at internet scale (pdf)
Most modern malware like Remote Administration Tools, ransomware, coin miners and espionage tools require communication with the internet, as they need to accept commands, transmit payloads, or exfiltrate sensitive information. Identifying such malicious communication potentially requires firewalls to decrypt encrypted traffic, make expensive queries to cloud infrastructure, or otherwise perform resource intensive computations, making such data collection impractical for all passing traffic. IP allow/block lists can potentially be used as a computational cheap pre-filter for these expensive operations but cannot be applied to unlisted IPs. Here we demonstrate that we can effectively expand upon the coverage of an allow or blocklist by building a machine learning (ML) model that is able to accurately predict if a previously unseen IP address is likely to be involved in known malicious behavior. While predicting malicious traffic based only on the IP address is difficult, we greatly improve on existing baseline with two different deep learning architectures and additionally utilizing pretraining. We test our approaches on two distinct datasets and show that combining our deep learning architectures and pretraining improves the area under the curve from .89 and .992 to .93 and .995 respectively. Our results show the viability of building an ML model as a replacement or augmentation to traditional allow and blocklists, and importantly should generalize to IPv6 data, where maintaining such lists manually might become intractable.