llama.cpp GGUF breaks [FIXED] #1376

danielhanchen · 2024-12-04T01:32:52Z

As of 3rd December 2024 - fixed.

Please update Unsloth via

pip install --upgrade --no-deps --no-cache-dir unsloth

The text was updated successfully, but these errors were encountered:

criogennn · 2024-12-04T02:31:13Z

RuntimeError                              Traceback (most recent call last)
Cell In[13], [line 12](vscode-notebook-cell:?execution_count=13&line=12)
      [9](vscode-notebook-cell:?execution_count=13&line=9) if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")
     [11](vscode-notebook-cell:?execution_count=13&line=11) # Save to q4_k_m GGUF
---> [12](vscode-notebook-cell:?execution_count=13&line=12) if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
     [13](vscode-notebook-cell:?execution_count=13&line=13) if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")
     [15](vscode-notebook-cell:?execution_count=13&line=15) # Save to multiple GGUF options - much faster if you want multiple!

File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1683, in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, first_conversion, push_to_hub, token, private, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
   [1681](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1681) python_install = install_python_non_blocking(["gguf", "protobuf"])
   [1682](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1682) git_clone.wait()
-> [1683](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1683) makefile = install_llama_cpp_make_non_blocking()
   [1684](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1684) new_save_directory, old_username = unsloth_save_model(**arguments)
   [1685](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1685) python_install.wait()

File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:778, in install_llama_cpp_make_non_blocking()
    [776](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:776) check = os.system("cmake llama.cpp -B llama.cpp/build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=OFF -DLLAMA_CURL=ON")
    [777](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:777) if check != 0:
--> [778](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:778)     raise RuntimeError(f"*** Unsloth: Failed compiling llama.cpp using os.system(...) with error {check}. Please report this ASAP!")
    [779](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:779) pass
    [780](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:780) # f"cmake --build llama.cpp/build --config Release -j{psutil.cpu_count()*2} --clean-first --target {' '.join(LLAMA_CPP_TARGETS)}",

RuntimeError: *** Unsloth: Failed compiling llama.cpp using os.system(...) with error 32512. Please report this ASAP!

Error on

if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

model unsloth/Llama-3.2-3B-Instruct
GPU RTX 3050 8GB

pytorch 2.5.1

criogennn · 2024-12-04T02:47:02Z

install cmake and get this one

RuntimeError                              Traceback (most recent call last)
File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1689, in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, first_conversion, push_to_hub, token, private, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
   [1688](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1688) try:
-> [1689](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1689)     new_save_directory, old_username = unsloth_save_model(**arguments)
   [1690](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1690)     makefile = None

File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    [115](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/utils/_contextlib.py:115) with ctx_factory():
--> [116](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/utils/_contextlib.py:116)     return func(*args, **kwargs)

File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:714, in unsloth_save_model(model, tokenizer, save_directory, save_method, push_to_hub, token, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, use_temp_dir, commit_message, private, create_pr, revision, commit_description, tags, temporary_location, maximum_memory_usage)
    [713](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:713) else:
--> [714](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:714)     internal_model.save_pretrained(**save_pretrained_settings)
    [715](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:715) pass

File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2938, in PreTrainedModel.save_pretrained(self, save_directory, is_main_process, state_dict, save_function, push_to_hub, max_shard_size, safe_serialization, variant, token, save_peft_format, **kwargs)
   [2937](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2937) for name in disjoint_names:
-> [2938](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2938)     state_dict[name] = state_dict[name].clone()
   [2940](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2940) # When not all duplicates have been cleaned, still remove those keys, but put a clear warning.
   [2941](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2941) # If the link between tensors was done at runtime then `from_pretrained` will not get
   [2942](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2942) # the key back leading to random tensor. A proper warning will be shown
   [2943](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2943) # during reload (if applicable), but since the file is not necessarily compatible with
   [2944](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2944) # the config, better show a proper warning.
...
--> [778](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:778)     raise RuntimeError(f"*** Unsloth: Failed compiling llama.cpp using os.system(...) with error {check}. Please report this ASAP!")
    [779](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:779) pass
    [780](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:780) # f"cmake --build llama.cpp/build --config Release -j{psutil.cpu_count()*2} --clean-first --target {' '.join(LLAMA_CPP_TARGETS)}",

RuntimeError: *** Unsloth: Failed compiling llama.cpp using os.system(...) with error 256. Please report this ASAP!

criogennn · 2024-12-04T21:07:53Z

Before updating unsloth, I encountered the error "CUDA driver error: out of memory". After updating yesterday, the error changed to the one described above. This issue arises when I attempt to save a model in GGUF 4-bit format to run it later in Ollama. Is the 8GB memory of my RTX 3050 insufficient for this task? Training the model completed successfully; the problem occurs specifically during the saving process. I would greatly appreciate any advice or assistance.

danielhanchen changed the title ~~llama.cpp GGUF breaks~~ llama.cpp GGUF breaks [FIXED] Dec 4, 2024

danielhanchen pinned this issue Dec 4, 2024

danielhanchen added URGENT BUG Urgent bug fixed Fixed! labels Dec 4, 2024

criogennn mentioned this issue Dec 4, 2024

xFormers wasn't build with CUDA support #1358

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp GGUF breaks [FIXED] #1376

llama.cpp GGUF breaks [FIXED] #1376

danielhanchen commented Dec 4, 2024 •

edited

Loading

criogennn commented Dec 4, 2024

criogennn commented Dec 4, 2024

criogennn commented Dec 4, 2024

llama.cpp GGUF breaks [FIXED] #1376

llama.cpp GGUF breaks [FIXED] #1376

Comments

danielhanchen commented Dec 4, 2024 • edited Loading

criogennn commented Dec 4, 2024

criogennn commented Dec 4, 2024

criogennn commented Dec 4, 2024

danielhanchen commented Dec 4, 2024 •

edited

Loading