Release Llama 3.3 + Dynamic 4bit Quants · unslothai/unsloth

We provide dynamic 4bit quants which uses a bit more memory, but vastly improves accuracy for finetuning and inference. Unsloth will now default to these versions! See https://unsloth.ai/blog/dynamic-4bit for more details.

Llama 3.3 is out now! Read our blog: https://unsloth.ai/blog/llama3-3

You can now fine-tune Llama 3.3 (70B) up to 90,000 context lengths with Unsloth, which is 13x longer than what Hugging Face + FA2 supports at 6,900 on a 80GB GPU.
For Llama 3.1 (8B), Unsloth can now do a whopping 342,000 context length, which exceeds the 128K context lengths Llama 3.1 natively supported. HF + FA2 can only do 28,000 on a 80GB GPU, so Unsloth supports 12x context lengths.
70B models can now fit on 41GB of VRAM - nearly 40GB!

All notebooks now use these dynamic quants:

Llama 3.2 Vision finetuning - Radiography use case. Free Colab Kaggle Notebook
Qwen 2 VL Vision finetuning - Maths OCR to LaTeX. Free Colab Kaggle Notebook
Pixtral 12B Vision finetuning - General QA datasets. Free Colab
Please run pip install --upgrade --no-cache-dir unsloth unsloth_zoo

Experiments

Quantizing Qwen2-VL-2B Instruct down to 4 bits breaks the model entirely.

Qwen2-VL-2B-Instruct	Description	Size	Result
16bit	The image shows a train traveling on tracks.	4.11GB	✅
Default 4bit all layers	The image depicts a vibrant and colorful scene of a coastal area.	1.36GB	❌
Unsloth quant	The image shows a train traveling on tracks.	1.81GB	✅

Merging to 16bit now works as expected.

Fixed a major bug which caused merges to not function correctly for vision models.

Llama.cpp GGUF saving now uses `cmake`.

All saving modules are also updated inside of Unsloth!

Apple Cut Cross Entropy

We worked with Apple to add Cut Cross Entropy into Unsloth which reduces VRAM use and increase context length further.

QwQ 4bit quants and GGUFs

Try a O1 test time compute LLM out! See https://huggingface.co/unsloth

What's Changed

Vision by @danielhanchen in #1318
Bug fixes for vision by @danielhanchen in #1340
Update README.md by @shimmyshimmer in #1374
Fix llama.cpp GGUF by @danielhanchen in #1375
Dynamic quants by @danielhanchen in #1379

Full Changelog: November-2024...December-2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.3 + Dynamic 4bit Quants

Experiments

Merging to 16bit now works as expected.

Llama.cpp GGUF saving now uses `cmake`.

Apple Cut Cross Entropy

QwQ 4bit quants and GGUFs

What's Changed

Contributors

Llama 3.3 + Dynamic 4bit Quants

Experiments

Merging to 16bit now works as expected.

Llama.cpp GGUF saving now uses cmake.

Apple Cut Cross Entropy

QwQ 4bit quants and GGUFs

What's Changed

Contributors

Llama.cpp GGUF saving now uses `cmake`.