Supervised/unsupervised cross-over method for autonomous anomaly classification

Neil Caithness

Classical threat detection relies on rule-based systems that are often too rigid for rapid changes in the adversary landscape. Senseon has developed a method that addresses this problem by modelling typical user/device behaviour and identifying instances that do not conform to established baselines. Here, we present a method of anomaly detection and classification that starts with unsupervised statistical learning, performs autonomous class labelling, and finally builds a supervised classification engine. One key element that facilitates the method is the calculation of an anomaly score and its probability density function (PDF) from the residuals of a low-rank approximation of the input data stream. After reviewing background theory of low-rank approximation, we then present elements of the method.

Anomaly detection is performed using the residual sum of squares of the low-rank approximation of the input data known as the truncated singular value decomposition (SVD). We show that the resulting anomaly scores are distributed as chi-square at k-degrees of freedom which allows consistent comparison across data sets. The scores allow us to perform a weighted cluster analysis in the low-dimensional space, which in turn is used to assign class-labels to clusters. Clusters can be interpreted with respect to their driving features as seen on a biplot of factors of the SVD, U and V. This provides interpretable justifications during inference on new data.

The advantage of this method over other contemporary methods lies in the interpretability of results. With this method, we can explain why an observation was determined to be anomalous. Contrast this with methods that are essentially black boxes producing accurate results, but without the prospect of interpretation. We review how our method can be applied to cyber security data, and why the interpretability of results constitutes a significant innovation. Finally, we show how this anomaly detection and inference is integrated into our broader cyber defence reasoning framework.