Skip to content

Commit

Permalink
paul rottiger
Browse files Browse the repository at this point in the history
  • Loading branch information
logan-siyao-peng committed Nov 2, 2023
1 parent 79df4dd commit 0eecc86
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions _events/2023-11-08-paul-röttger.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ title: "LLM Safety: What does it mean and how do we get there?"
abstract: AI safety, and specifically the safety of large language models (LLMs) like ChatGPT, is receiving unprecedented public and regulatory attention. In my talk, split into two parts, I will try to give some more concrete meaning to this often nebulous topic and the challenges it poses. First, I will define LLM safety with a focus on near-term risks and explain why LLM safety matters, countering common arguments against this line of work. I will also give an overview of current methods for ensuring LLM safety, from red-teaming to fine-grained feedback learning. Second, I will zoom in on imitation learning, where models are trained on outputs from other models, as a particularly common way of improving the capabilities of open LLMs. I will talk about our own work in progress on safety by imitation, where we extend imitation learning to safety-related behaviours. I will present the resources we have built already, and then transition into an open discussion about our hypotheses and planned experiments, followed by a Q&A to close out the hour.
speaker: Paul Röttger<br/>
PostDoc in MilaNLP Lab at Bocconi University
bio: Paul is a postdoctoral researcher in Dirk Hovy‘s MilaNLP Lab at Bocconi University. His work is located at the intersection of computation, language and society. Right now, he is particularly interested in evaluating and aligning social values in large generative language models, and, by extension, in AI safety. Before coming to Milan, he completed his PhD at the University of Oxford, where he worked on improving the evaluation and effectiveness of large language models for hate speech detection. For more info, please visit <a href="https://paulrottger.com/ ">Paul’s website</a>.
bio: Paul is a postdoctoral researcher in Dirk Hovy‘s MilaNLP Lab at Bocconi University. His work is located at the intersection of computation, language and society. Right now, he is particularly interested in evaluating and aligning social values in large generative language models, and, by extension, in AI safety. Before coming to Milan, he completed his PhD at the University of Oxford, where he worked on improving the evaluation and effectiveness of large language models for hate speech detection.
website: https://paulrottger.com/
time: November 8, 2023; 11:00–12:00
location: Akademiestr. 7, room 218A (meeting room)
roomfinder: https://mainlp.github.io/contact/
img: assets/img/paul-rottger.jpeg
img: paul-rottger.jpeg
imgalt: Portrait of Paul Röttger
imgside: right
anchor: 2023-11-08-paul-röttger
Expand Down

0 comments on commit 0eecc86

Please sign in to comment.