Adversarial Robust Machine Learning
Title: Adversarial Robust Machine Learning
DNr: Berzelius-2024-141
Project Type: LiU Berzelius
Principal Investigator: Jia Fu <jiafu@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2024-04-02 – 2024-11-01
Classification: 10201
Keywords:

Abstract

Introduction: RISE and KTH are together performing research funded by the Digital Futures project "DataLEASH" and the VINNOVA project "SweWIN". Many machine learning (ML) systems have proved vulnerable to adversarial attacks, during both training and usage. The project involves research on how to make artificial intelligence (Al) models impervious to irregularities and attacks, by rooting out weaknesses, anticipating new strategies, and designing robust models that perform as well in the wild as they do in a sandbox. Research questions: The research questions connect to both attack and defence mechanisms. How to capture the vulnerabilities of deep neural networks (DNN) concerning different forms of adversarial attack? What can be done during the training of the model to resist multiple adversarial perturbations? How to learn the robust representation capable of purifying agnostic adversarial noises? Methodology: Finding an appropriate internal representation is the key to the success of deep learning methods. There is a need to control the construction of the representation to develop inherently robust ML methods. Developing such a representation depends on the design and structure of the neural network, the regularisation of the model, and the choice of training input. This research connects to several previous ML research. For example, denoising diffusion probabilistic models are successful in image restoration tasks, the same techniques can be employed in adversarial settings to purify different types of perturbations out of the distribution of the original data. Inspirationally, creating a representation that preserves underlying causal mechanisms is suitable for generating counterfactual explanations, it can additionally enhance robustness to adversarial attacks. Robustness against black-box attacks will be the main focus of this project since we assume that the adversary doesn't have access to training data or the deployed models. The project will use publicly available datasets to benchmark our methods compared to other state-of-the-art baselines.