The code is based on PyTorch and HuggingFace transformers
.
pip install -r requirements.txt
cd scripts
bash train.sh
Arguments explanation:
-
--dataset
: the name of datasets, just for notation -
--data_dir
: the path to the saved datasets folder, containingtrain.jsonl,test.jsonl,valid.jsonl
-
--seq_len
: the max length of sequence$z$ ($x\oplus y$ ) -
--resume_checkpoint
: if not none, restore this checkpoint and continue training -
--vocab
: the tokenizer is initialized using bert or load your own preprocessed vocab dictionary (e.g. using BPE) -
--learned_mean_embed
: set whether to use the learned soft absorbing state. -
--denoise
: set whether to add discrete noise -
--use_fp16
: set whether to use mixed precision training -
--denoise_rate
: set the denoise rate, with 0.5 as the default, no effect in this version
Perform full 2000 steps diffusion process. Achieve higher performance compare with Speed-up Decoding
cd scripts
bash run_decode.sh
We customize the implementation of DPM-Solver++ to DiffuSeq to accelerate its sampling speed.
cd scripts
bash run_decode_solver.sh