Kate Highnam
Imperial College London,
Kai Arulkumaran
Araya,
Zachary Hanif
Independent Researcher,
and
Nicholas R. Jennings
Imperial College London
BETH Dataset: Real Cybersecurity Data for Anomaly Detection Research (pdf)
Kate Highnam, Kai Arulkumaran, Zachary Hanif, Nicholas R. Jennings
We present the BETH cybersecurity dataset for anomaly detection and out-of-distribution analysis. With real "anomalies" collected using a novel tracking system, our dataset contains over eight million data points tracking 23 hosts. Each host has captured benign activity and, at most, a single attack, enabling cleaner behavioural analysis. In addition to being one of the most modern and extensive cybersecurity datasets available, BETH enables the development of anomaly detection algorithms on heterogeneously-structured real-world data, with clear downstream applications. We give details on the data collection, suggestions on pre-processing, and analysis with initial anomaly detection benchmarks on a subset of the data.