Privacy-preserving Surveillance Methods using Homomorphic Encryption
William Bowditch, Bill Buchanan, Will Abramson, Nikolaos Pitropakis, Adam Hall
Data analysis and machine learning methods often involve the processing of cleartext data, and where this could breach the rights to privacy. Increasingly, we must use encryption to protect all states of the data: in-transit, at-rest, and in-memory. While tunnelling and symmetric key encryption are often used to protect data in-transit and at-rest, our major challenge is to protect data within memory, while still retaining its value. This challenge is addressed by homomorphic encryption, which enables mathematical operations to be performed on the data, without actually revealing its original contents. Within surveillance, too, fundamentally we must respect the rights to privacy of those subjects who are not actually involved within an investigation. Homomorphic encryption, thus, could have a major role in protecting the rights to privacy, while providing ways to learn from captured data. Our work presents a novel use case and evaluation in the usage of homomorphic encryption and machine learning. It uses scikit-learn and Python implementations of Pailler and FV schemes, in order to create a homomorphic machine learning classification technique that allows model owners to classify data without jeopardizing user privacy. While the state-of-the-art homomorphic methods proposed today are impractical for computationally complex tasks like machine learning without substantial delay, the schemes we review are capable of handling machine learning inference. We construct a hypothetical scenario, solved with homomorphic encryption, such that a government agency wishes to use machine learning in order to identify pro-ISIS messages without (a) collecting the messages of citizens and (b) allowing users to reverse engineer the model. This scenario can demonstrated in real-time during a presenation using a simple Rasberry Pi setup. Regarding our developed system, the input used to train the surveillance model is a synthetic ISIS-related dataset. The poster session aims to provide a review of modern HE schemes for non-cryptography specialists, and gives simple examples of the usage of homomorphic encryption with benchmarking across different proposed schemes.