We provide dynamic 4bit quants which uses a bit more memory, but vastly improves accuracy for finetuning and inference. Unsloth will now default to these versions! See https://unsloth.ai/blog/dynamic-4bit for more details.
Llama 3.3 is out now! Read our blog: https://unsloth.ai/blog/llama3-3
- You can now fine-tune Llama 3.3 (70B) up to 90,000 context lengths with Unsloth, which is 13x longer than what Hugging Face + FA2 supports at 6,900 on a 80GB GPU.
- For Llama 3.1 (8B), Unsloth can now do a whopping 342,000 context length, which exceeds the 128K context lengths Llama 3.1 natively supported. HF + FA2 can only do 28,000 on a 80GB GPU, so Unsloth supports 12x context lengths.
- 70B models can now fit on 41GB of VRAM - nearly 40GB!
All notebooks now use these dynamic quants:
- Llama 3.2 Vision finetuning - Radiography use case. Free Colab Kaggle Notebook
- Qwen 2 VL Vision finetuning - Maths OCR to LaTeX. Free Colab Kaggle Notebook
- Pixtral 12B Vision finetuning - General QA datasets. Free Colab
- Please run
pip install --upgrade --no-cache-dir unsloth unsloth_zoo
Experiments
Quantizing Qwen2-VL-2B Instruct down to 4 bits breaks the model entirely.
Qwen2-VL-2B-Instruct | Description | Size | Result |
---|---|---|---|
16bit | The image shows a train traveling on tracks. | 4.11GB | ✅ |
Default 4bit all layers | The image depicts a vibrant and colorful scene of a coastal area. | 1.36GB | ❌ |
Unsloth quant | The image shows a train traveling on tracks. | 1.81GB | ✅ |
Merging to 16bit now works as expected.
Fixed a major bug which caused merges to not function correctly for vision models.
Llama.cpp GGUF saving now uses cmake
.
All saving modules are also updated inside of Unsloth!
Apple Cut Cross Entropy
We worked with Apple to add Cut Cross Entropy into Unsloth which reduces VRAM use and increase context length further.
QwQ 4bit quants and GGUFs
Try a O1 test time compute LLM out! See https://huggingface.co/unsloth
What's Changed
- Vision by @danielhanchen in #1318
- Bug fixes for vision by @danielhanchen in #1340
- Update README.md by @shimmyshimmer in #1374
- Fix llama.cpp GGUF by @danielhanchen in #1375
- Dynamic quants by @danielhanchen in #1379
Full Changelog: November-2024...December-2024