Gradient Accumulation Fix
danielhanchen
released this
15 Oct 16:48
·
56 commits
to main
since this release
We fixed a gradient accumulation bug which was actually discovered since 2021 here, and rediscovered here. Read more in our blog post: https://unsloth.ai/blog/gradient
We have a Colab Notebook for Llama 3.2 using the fixed trainer and a Kaggle Notebook as well.
Essentially theoretically bsz * ga
should be equivalent to full batch training with no gradient accumulation, but weirdly the training losses do no match up:
To use Unsloth's fixed trainer with gradient accumulation, use:
from unsloth import unsloth_train
# trainer_stats = trainer.train() << Buggy if using gradient accumulation
trainer_stats = unsloth_train(trainer) # << Fixed gradient accumulation
Please update Unsloth on local machines (no need for Colab / Kaggle) via:
pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
Read our blog post: https://unsloth.ai/blog/gradient for more details!
What's Changed
- Llama 3.2 by @danielhanchen in #1058
- Fix merges by @danielhanchen in #1079
- Handle absolute paths for save_to_gguf using pathlib by @giuliabaldini in #1120
- Only remove folder in sentencepiece check if it was created by @giuliabaldini in #1121
- Gradient Accumulation Fix by @danielhanchen in #1134
New Contributors
- @giuliabaldini made their first contribution in #1120
Full Changelog: September-2024...October-2024