update html

luka-group · Jun 14, 2024 · ccd7f00 · ccd7f00
1 parent 34dc5b2
commit ccd7f00
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/tutorials/tutorial.202406.html b/tutorials/tutorial.202406.html
@@ -59,11 +59,11 @@ <h4>Addressing Training-time Threats to LLMs [35 min] <br><a href="./materials.2
 <h4>Mitigating Test-time Threats to LLMs [35 min] <br><a href="./materials.202406/2-Test-time.pdf">handout</a></h4>
 Malicious data existing in the training corpora, task instructions and human feedbacks are likely to cause threats to LLMs before they are deployed as Web services. Due to the limited accessibility of model components in these services, mitigation of such threats are realistically be address through test-time defense or detection. In the meantime, new types of vulnerabilities can also be introduced during test-time through adversarial prompts, instructions and few-shot demonstrations. In this part of tutorial, we will first introduce test-time threats to LLMs through prompt injection, malicious task instructions, jailbreaking attacks, adversarial demonstrations, and training-free backdoor attacks. We will then provide insights on mitigating some of those test-time threats based on techniques including prompt robustness estimation, demonstration-based defense, role-playing prompts and ensemble debiasing. While many issues with the test-time threats still remain unaddressed, we will also provide a discussion about how the community should develop to combat those issues.
 
-<h4>Handling Privacy Risks of LLMs [35 min] <br><a href="./materials.202406/3-Privacy">handout</a></h4>
+<h4>Handling Privacy Risks of LLMs [35 min] <br><a href="./materials.202406/3-Privacy.pdf">handout</a></h4>
 Along with LLMs’ impressive performance, there have been increasing concerns about their privacy risks. In this part of the tutorial, we will first discuss several privacy risks related to membership inference attack and training data extraction. Next we will discuss privacy-preserving methods in two categories: (i) data sanitization including techniques to detect and remove personal identifier information, or replace sensitive tokens based on differential privacy (DP); (ii) Privacypreserved training, with a focus on methods using DP for training. At last, we discuss existing methods on balancing between privacy and utility, and reflections on what it means for LLMs to preserve privacy, especially on understanding appropriate contexts for sharing information.
 
 
-<h4>Safeguarding LLM Copyright [35 min] <br><a href="./materials.202406/4-Copyright">handout</a></h4>
+<h4>Safeguarding LLM Copyright [35 min] <br><a href="./materials.202406/4-Copyright.pdf">handout</a></h4>
 Other than direct open source, many companies and organizations offer API access to their LLMsthat may be vulnerable to model extraction attacks via distillation. In this context, we will first describe potential model extraction attacks. We will then present watermark techniques to identify distilled LLMs, including those for MLMs and generative LMs. DRW adds a watermark in the form of a cosine signal that is difficult to eliminate into the output of the protected model. He et al. (2022) propose a lexical watermarking method to identify IP infringement caused by extraction attacks, and CATER proposes conditional watermarking by replacing synonyms of some words based on linguistic features. However, both methods are surface-level watermarks which the adversary can easily bypass by randomly replacing synonyms in the output, making it difficult to verify by probing the suspect models. GINSEW randomly groups vocabulary into two and adds a watermark based on a sinusoidal signal. This signal will be carried over to the distilled model and can be easily detected using Fourier transform.
 
 <h4>Future Research Directions [30 min] <br><a href="https://cogcomp.seas.upenn.edu/page/tutorial.202207/handout/5-Conclusion.pdf">handout</a></h4>