Skip to content

Code for a paper exploring using diffusion models to defend neural networks against adversarial attacks

Notifications You must be signed in to change notification settings

sebastian-weisshaar/Adversarial-Diffusion

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Denoising Diffusion Probabilistic Models as a Defense against Adversarial Attacks

Lars Ankile, Anna Midgley, Sebastian Weisshaar, Harvard University, 2022

This repository contains the code to reproduce the experiments and results in the paper. For any questions, reach out to [email protected] or open an issue in the repo. Read the final report in the PDF located in the repo.

Abstract

Neural Networks are infamously sensitive to small perturbations in their inputs, making them vulnerable to adversarial attacks. This project evaluates the performance of Denoising Diffusion Probabilistic Models (DDPM) as a purification technique to defend against adversarial attacks. This works by adding noise to an adversarial example before removing it through the reverse process of the diffusion model. We evaluate the approach on the PatchCamelyon data set for histopathologic scans of lymph node sections and find an improvement of the robust accuracy by up to 88% of the original model's accuracy, constituting a considerable improvement over the vanilla model and our baselines.

Selected Figures

An example of a tissue sample in the different stages of the model pipeline.

An example of a tissue sample in the different stages of the model pipeline

The results of running our four models on 1000 test samples for both standard accuracy (left) and robust accuracy (right). The vanilla ResNet model is red, and our method is purple. It is also important to note that the robust adversarially trained model is an instance of a GoogLeNet, and not ResNet, as this was the only tested architecture that generalized under adversarial training.

Model test set accuracy results

About

Code for a paper exploring using diffusion models to defend neural networks against adversarial attacks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.6%
  • TeX 1.4%