Konstantin Berlin
SQL Driven Infrastructure for Cybersecurity ML Operations (pdf, video)
Recently, there has been a major paradigm shift in cybersecurity protection, with the focus shifting from attack prevention on edge devices to cloud-centric detection pipelines on top of centrally stored data collected from an entire customer estate. Centralizing data in the cloud provides greater visibility, enabling the deployment of more complicated detection pipelines that can use information from multiple observability points to make more complex decisions. For example, data across email, firewall, and endpoints can be combined to provide not only more complex detection logic but to also orchestrate complex mitigations and remediations in response to an attack. In turn, this drastically increased the amount of data security vendors processed in the cloud to levels previously only seen in the largest cloud-based companies.
Here we describe Sophos AI’s latest MLOps infrastructure that is designed to be flexible, simple to maintain, and scalable. We conceptually refer to it as an immutable SQL-driven infrastructure. The idea behind this is SQL-orchestrated workflows running on top of a cloud-based SQL data warehouse (in this case Snowflake), where non-SQL components are directly accessible in SQL through external linkage of standard ECS/Kubernetes auto-scaling clusters fronted by a generic batching-first API. These external components are immutable (we do not remove them from infrastructure, just autoscale them to 0), meaning that any update to the components cannot break existing pipelines. Written in SQL the pipelines are much easier to understand and do not require complex cloud engineering skillset to maintain or modify.
We believe that the biggest challenge in cybersecurity ML remains data quality and that most smaller groups are challenged to fund dedicated engineering operations to support their work. We hope that sharing our data warehouse first approach to MLOps will give other teams ideas for how to reduce the complexity of their MLOps infrastructure