Skip to content

[EMNLP 2024] Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models

Notifications You must be signed in to change notification settings

FeiWang96/Data-Advisor

Repository files navigation

Data Advisor

[EMNLP 2024] Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models

🌐 Homepage | 📖 Paper | 🤗 Dataset (Data Advisor) | 🤗 Dataset (Self-Instruct)

Data Generation

Generate safety alignment data with Data Advisor:

python data_advisor.py
python response_generation.py

Generate safety alignment data with Self-Instruct:

python self_instruct.py
python response_generation.py

Training

First, prepare Alpagasus data:

python utils/export_alpagasus.py

Then, train the target model with Alpagasus data and safety alignment data generated by Data Advisor:

python train_target_model.py

Evaluation

Evaluate model safety with LlamaGuard on CatQA and BeaverTails:

bash scripts/eval_catqa.sh
bash scripts/eval_beavertails.sh

Evaluate model utility on MMLU:

bash scripts/eval_mmlu.sh

Citation

@inproceedings{wang2024data,
  title={Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models},
  author={Wang, Fei and Mehrabi, Ninareh and Goyal, Palash and Gupta, Rahul and Chang, Kai-Wei and Galstyan, Aram},
  booktitle={Proceedings of EMNLP 2024},
  year={2024}
}

About

[EMNLP 2024] Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published