DiffusionShield: A study of adversarial sample to protect unauthorized concept addition to stable diffusion model
Title: DiffusionShield: A study of adversarial sample to protect unauthorized concept addition to stable diffusion model
DNr: Berzelius-2023-296
Project Type: LiU Berzelius
Principal Investigator: Le Minh Ha <le.minh.ha@liu.se>
Affiliation: Linköpings universitet
Duration: 2023-11-04 – 2024-06-01
Classification: 10201
Keywords:

Abstract

Abstract: Our primary objective is to develop "DiffusionShield," a robust defense mechanism designed to counteract unauthorized concept additions to Stable Diffusion models. This project aims to extend the foundational work of Stable Diffusion and Google's DreamBooth by introducing advanced adversarial samples that can effectively protect user images from being misused in personalized text-to-image synthesis. With the proliferation of text-to-image diffusion models, tools like DreamBooth have made it possible to generate realistic images of specific individuals using minimal reference images. While these advancements hold immense potential, they also pose significant societal risks, such as the creation of fake news or disturbing content targeting individuals. Our project addresses this critical gap by ensuring that users' images are safeguarded against potential malicious use, thereby preserving individual privacy, protecting digital proprietary and preventing misinformation. By the conclusion of this project, we anticipate having a fully functional and tested defense system capable of introducing imperceptible noise perturbations to user images. This system will effectively disrupt the generation quality of models like DreamBooth when trained on these perturbed images. We expect our defense mechanism to be adaptable across various text-to-image model versions, ensuring broad applicability and protection. Finally, we expect to write up a research article based on our results The project will leverage the Berzelius computing resources to extensively evaluate our defense mechanisms. We will employ adversarial training techniques, optimization algorithms for perturbation, and extensive evaluation on publicly available datasets. The methods will be benchmarked against various text-to-image model versions, ensuring comprehensive protection. The software stack will include Python-based machine learning frameworks (Pytorch, Tensorflow), with potential integration of the open-source code from Stability AI, Google Research, Microsoft Research, etc. for foundational support.