Kate Highnam

Imperial College London,

Kai Arulkumaran

Araya,

Zachary Hanif

Independent Researcher,

and

Nicholas R. Jennings

Imperial College London

BETH Dataset: Real Cybersecurity Data for Anomaly Detection Research (pdf)

Kate Highnam, Kai Arulkumaran, Zachary Hanif, Nicholas R. Jennings

We present the BETH cybersecurity dataset for anomaly detection and out-of-distribution analysis. With real "anomalies" collected using a novel tracking system, our dataset contains over eight million data points tracking 23 hosts. Each host has captured benign activity and, at most, a single attack, enabling cleaner behavioural analysis. In addition to being one of the most modern and extensive cybersecurity datasets available, BETH enables the development of anomaly detection algorithms on heterogeneously-structured real-world data, with clear downstream applications. We give details on the data collection, suggestions on pre-processing, and analysis with initial anomaly detection benchmarks on a subset of the data.