U Can't Gen This? 😎 A Survey of Intellectual Property Protection Methods for Data in Generative AI [ArXiv preprint]

Overview

This repository contains a collection of methods for intellectual property protection for data in GAI. As the field continuously grows and lacks unified terminology and classification, we propose the taxonomy for these methods.

👷 Work in progress.

0. Background on memorisation and data duplication

[2022] Carlini et al. Quantifying Memorization Across Neural Language Models [paper]
[2023] Carlini et al. Extracting Training Data from Diffusion Models. USENIX 2023 [paper]

1. Training data sanitisation

[2021] Lee et al. Deduplicating Training Data Makes Language Models Better [paper][code]

2. Adversarial modifications

[2020] Shan et al. Fawkes: Protecting Privacy against Unauthorized Deep Learning Models. USENIX 2020. [paper]
[2023] Shan et al. Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models USENIX 2023 [paper]
[2023] Salman et al. Photoguard: Raising the Cost of Malicious AI-Powered Image Editing [paper] [code]
[2023] Zheng et al. ACE: Understanding and Improving Adversarial Attacks on Latent Diffusion Models [paper] [code]
[2023] Ye et al. DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization [paper]
[2023] Liang et al. Adversarial Example Does Good: Preventing Painting Imitation from Diffusion Models via Adversarial Examples [paper] [code]
[2023] Zhao et al. Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation [paper] [code]
[2023] Shan et al. Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models IEEE S&P 2024. [paper]
[2023] Chen et al. EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models [paper] [code]
[2023] Liu et al. MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning [paper][code]
[2023] Liang et al. Mist: Towards Improved Adversarial Examples for Diffusion Models [paper] [code]
[2024] Li et al. Neural Style Protection: Counteracting Unauthorized Neural Style Transfer. [paper]

3. Concept removal

[2023] Gandikota et al. Erasing Concepts from Diffusion Models ICCV 2023 [paper] [code]
[2023] Dong et al. Towards Test-Time Refusals via Concept Negation NeurIPS 2023 [paper]
[2023] Kong et al. Data Redaction from Pre-trained GANs SaTML 2023 [paper]
[2024] Gandikota et al. Unified Concept Editing in Diffusion Models [paper] [code]
[2023] Kumari et al. Ablating Concepts in Text-to-Image Diffusion Models [paper][code]
[2024] Lu et al. MACE: Mass Concept Erasure in Diffusion Models. CVPR 2024 [paper] [code]
[2024] Zhao et al. Separable Multi-Concept Erasure from Diffusion Models [paper][code]
[2024] Zhang et al. Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models CVPR 2024 [paper] [code]

4. Watermarking

[2023] Cui et al. FT-SHIELD: A Watermark Against Unauthorised Fine-Tuning in Text-to-Image Diffusion Models [paper]
[2023] Cui et al. DiffusionShield: A Watermark for Data Copyright Protection Against Generative Diffusione Models [paper][code]
[2023] Feng et al. Catch You Everything Everywhere: Guarding Textual Inversion via Concept Watermark [paper]
[2024] Liu et al. Detecting Voice Cloning Attacks via Timbre Watermarking NDSS 2024 [paper][code][project]
[2023] Ma et al. Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis [paper]
[2024] Roman et al. AudioSeal: Proactive Detection of Voice Cloning with Localized Watermarking [paper] [code]
[2023] Tan et al. A Somewhat Robust Image Watermark against Diffusion-based Editing Models [paper][code]

5. Data attribution

[2023] Georgiev et al. The Journey, Not the Destination: How Data Guides Diffusion Models [paper] [code]
[2023] Dai and Gifford Training Data Attribution for Diffusion Models [paper] [code]
[2023] Somepalli et al. Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models CVPR 2023 [paper] [code]
[2023] Wang et al. Evaluating Data Attribution for Text-to-Image Models ICCV 2023 [paper][code]
[2023] Zhang et al. EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection. CVPR 2024 [paper][code][project]

Cite

@misc{šarčević2024u,
      title={U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI}, 
      author={Tanja Šarčević and Alicja Karlowicz and Rudolf Mayer and Ricardo Baeza-Yates and Andreas Rauber},
      year={2024},
      eprint={2406.15386},
      archivePrefix={arXiv},
      primaryClass={id='cs.CY' full_name='Computers and Society' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers impact of computers on society, computer ethics, information technology and public policy, legal aspects of computing, computers and education. Roughly includes material in ACM Subject Classes K.0, K.2, K.3, K.4, K.5, and K.7.'}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

U Can't Gen This? 😎 A Survey of Intellectual Property Protection Methods for Data in Generative AI [ArXiv preprint]

Overview

0. Background on memorisation and data duplication

1. Training data sanitisation

2. Adversarial modifications

3. Concept removal

4. Watermarking

5. Data attribution

Cite

Files

README.md

Latest commit

History

README.md

File metadata and controls

U Can't Gen This? 😎 A Survey of Intellectual Property Protection Methods for Data in Generative AI [ArXiv preprint]

Overview

0. Background on memorisation and data duplication

1. Training data sanitisation

2. Adversarial modifications

3. Concept removal

4. Watermarking

5. Data attribution

Cite