Sampling with integrated gradients: python ./src/sampling/ig_sampling.py
Sampling with Lime: python ./src/sampling/lime_sampling.py
Hate speech and NLI examples (generated using either integrated gradients (ag-ig) or Lime (ag-lime))
python ./src/modeling/finetune.py
python ./src/modeling/generate.py
Trained adversarial generation models can be found here: https://huggingface.co/skg/na-models.
Example Usage:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
model = GPT2LMHeadModel.from_pretrained(model_dir)
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
sequence = "[attr] br , ĠThey , Ġbought , Ġwent , Ġbut , Ġavailable , Ġmodel , Ġdeliberate , Ġwanted , > , ĠI , Ġdecided [label] 2 [text]"
premise = "Grey<br>I went to the store to buy a new phone. The one I wanted was available. The salesperson showed me three different colors. I had a hard time choosing. I finally decided on the grey model. [SEP]"
input_text = tokenizer(sequence + " " + premise,return_tensors="pt")
output_text = model.generate(**input_text,max_length=200,num_beams=5,repetition_penalty=2.5)
output_text = tokenizer.decode(output_text[0].tolist())
print(output_text.split("[SEP] ")[-1].replace("<|endoftext|>",""))
Links to tested classifiers can be found here:
Hate Speech:
NLI:
@article{Gabriel2022NaturalAdversaries,
title={NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?},
author={Saadia Gabriel and Hamid Palangi and Yejin Choi},
journal={Findings of EMNLP},
year={2022}
}