Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

converting LORA to ggml to gguf #3953

Closed
xcottos opened this issue Nov 5, 2023 · 11 comments
Closed

converting LORA to ggml to gguf #3953

xcottos opened this issue Nov 5, 2023 · 11 comments
Labels

Comments

@xcottos
Copy link

xcottos commented Nov 5, 2023

Hi everybody,

I have an huggingface model (https://huggingface.co/andreabac3/Fauno-Italian-LLM-13B) that I would like to convert to gguf.

That is a LORA model and I was able to convert it in ggml using convert-lora-to-ggml.py.

Now when I try to convert it to gguf, I tried using convert-llama-ggml-to-gguf.py but the magic number of the ggml model (generated with the first conversion) has a magic number (b'algg') that is not recognised

what am I doing wrong?

Thank you
Luca

@KerfuffleV2
Copy link
Collaborator

KerfuffleV2 commented Nov 5, 2023

You don't need to convert from the LoRA from GGML to GGUF. I think what you may be doing wrong is trying to load the LoRA with --model or -m? The way LoRA's work is you load the base model and apply the LoRA on top of it. So in addition to what you linked you'll also need the base model in GGUF to apply the LoRA to. Then you'll do -m base_model.gguf --lora your_lora.bin when actually trying to load the model.

edit: I think this should work as the base model: https://huggingface.co/TheBloke/LLaMA-13b-GGUF

@xcottos
Copy link
Author

xcottos commented Nov 5, 2023

Thank you Kerfuffle, let me process your answer (I'm quite a newbie in LLM) and I will come back to you once I make progresses

Thank you again for the explanation
Luca

@YukiTomita-CC
Copy link

Hello, I am also facing the same problem.

I was attempting to use a different LoRA adapter, but for now, I followed the previous conversation and downloaded two models. I put TheBloke/LLaMA-13b-GGUF into the llama.cpp/models directory and andreabac3/Fauno-Italian-LLM-13B into the llama.cpp/models/loras directory. After that, I ran the main command as follows:

./main -m models/llama-13b.Q8_0.gguf --lora models/loras/adapter_model.bin --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n 256 -p "The conversation between human and AI assistant.\n[|Human|] Qual'è il significato della vita?\n[|AI|] "

However, the result was as follows (with prior output omitted for brevity):

....................................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  = 3200.00 MB
llama_build_graph: non-view tensors processed: 924/924
llama_new_context_with_model: compute buffer total size = 364.63 MB
llama_apply_lora_from_file_internal: applying lora adapter from 'models/loras/adapter_model.bin' - please wait ...
llama_apply_lora_from_file_internal: unsupported file version
llama_init_from_gpt_params: error: failed to apply lora adapter
main: error: unable to load model

I am running the latest code and running this on a Docker container with the Ubuntu:22.04 image.
/# make --version | head -1
GNU Make 4.3
/# g++ --version | head -1
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

I apologize if I missed any documentation and am not using this correctly. If I could successfully use the LoRA adapter in llama.cpp, it would make a significant difference to my project.
I am grateful for this repository and the support provided. Any help would be greatly appreciated.

@KerfuffleV2
Copy link
Collaborator

@yuki-tomita-127

Hello, I am also facing the same problem.

Did you already convert the LoRA using convert-llama-ggml-to-gguf.py? I think your problem is different from the other person since it sounds like you missed that step.

@YukiTomita-CC
Copy link

@KerfuffleV2

Thank you for your response.

I apologize for the lack of detail in my previous post. I have attempted to use the convert-llama-ggml-to-gguf.py. Below are the steps I have taken, but I encounter an error when converting from LoRA to GGML to GGUF.

  1. I used convert-lora-to-ggml.py to convert the original LoRA adapter.
python convert-lora-to-ggml.py models/loras

Output:

<Output omitted>
Converted models/loras/adapter_config.json and models/loras/adapter_model.bin to models/loras/ggml-adapter-model.bin

This seems to have worked successfully.

  1. I then used convert-llama-ggml-to-gguf.py to convert the LoRA adapter from ggml to gguf.
python convert-llama-ggml-to-gguf.py --input models/loras/ggml-adapter-model.bin --output models/loras/ggml-adapter-model.gguf

Output:

* Using config: Namespace(input=PosixPath('models/loras/ggml-adapter-model.bin'), output=PosixPath('models/loras/ggml-adapter-model.gguf'), name=None, desc=None, gqa=1, eps='5.0e-06', context_length=2048, model_metadata_dir=None, vocab_dir=None, vocabtype='spm')

=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===

- Note: If converting LLaMA2, specifying "--eps 1e-5" is required. 70B models also need "--gqa 8".
* Scanning GGML input file
Traceback (most recent call last):
  File "/workspaces/llama.cpp/convert-llama-ggml-to-gguf.py", line 453, in <module>
    main()
  File "/workspaces/llama.cpp/convert-llama-ggml-to-gguf.py", line 430, in main
    offset = model.load(data, 0)
  File "/workspaces/llama.cpp/convert-llama-ggml-to-gguf.py", line 190, in load
    offset += self.validate_header(data, offset)
  File "/workspaces/llama.cpp/convert-llama-ggml-to-gguf.py", line 175, in validate_header
    raise ValueError(f"Unexpected file magic {magic!r}! This doesn't look like a GGML format file.")
ValueError: Unexpected file magic b'algg'! This doesn't look like a GGML format file.

This error occurs when I do so, which I believe is the same result that @xcottos experienced.

@KerfuffleV2
Copy link
Collaborator

KerfuffleV2 commented Nov 6, 2023

@yuki-tomita-127

Oh, I'm very sorry. I meant to write convert-lora-to-ggml.py there. My mistake. convert-llama-ggml-to-gguf.py is for converting actual models from GGML to GGUF.

So just to be clear, you'll use convert-lora-to-ggml.py to convert the original HuggingFace format (or whatever) LoRA to the correct format. After that, you don't need any further conversion steps (like from GGML to GGUF). You can load the output from convert-lora-to-ggml.py with --lora with the main example, etc.

@Galunid
Copy link
Collaborator

Galunid commented Nov 6, 2023

@yuki-tomita-127

./main -m models/llama-13b.Q8_0.gguf --lora models/loras/adapter_model.bin --color -c 4096 --temp 0.7 --repeat_penalty >1.1 -n 256 -p "The conversation between human and AI assistant.\n[|Human|] Qual'è il significato della vita?\n[|AI|] "

In your launch command, shouldn't you change models/loras/adapter_model.bin to models/loras/ggml-adapter-model.bin?

@KerfuffleV2
Copy link
Collaborator

In your launch command, shouldn't you change

Their problem was I accidentally told them the wrong script, so they weren't able to produce the converted adapter at all.

@YukiTomita-CC
Copy link

@KerfuffleV2 @Galunid

Indeed, passing the output of convert-lora-to-ggml.py to the --lora option in the main command worked!
I hadn't understood the file format conversion. I really appreciate the series of answers you've provided!

Also, I'd like to apologize if I've taken over the thread a bit; @xcottos was the one who started this discussion. Sorry about that.

@github-actions github-actions bot added the stale label Mar 19, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 2, 2024
@HougeLangley
Copy link

MediaBrain-SJTU/MING#20

Same for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants