Instructions to train SmolLM2-1.7B-Instruct

We build the SmolLM2-Instruct by doing SFT on SmolTalk and then DPO on UltraFeedBack.

Setup

Follow the installation instructions in https://github.com/huggingface/alignment-handbook/tree/main?tab=readme-ov-file#installation-instructions

Training

We train the 1.7B on 8 GPUs using the following command:

# SFT
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/smollm2/sft/config.yaml

# DPO
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/smollm2/dpo/config.yaml

For the 135M and 360M we use smol-smoltalk dataset for SFT and UltraFeedback for DPO:

# SFT
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/smollm2/sft/config_smol.yaml

# DPO
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/smollm2/dpo/config_smol.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Instructions to train SmolLM2-1.7B-Instruct

Setup

Training

Files

README.md

Latest commit

History

README.md

File metadata and controls

Instructions to train SmolLM2-1.7B-Instruct

Setup

Training