Use Facebook LLaMa model with pyllamacpp? #59

VoxanyNet · 2023-04-13T18:12:53Z

VoxanyNet
Apr 13, 2023

Hello!

I want to use the Facebook LLaMA model with pyllamacpp and am looking for some help.

I followed the instructions on the llama.cpp README to convert the model to the ggml format and quantize it. I am using the 30B model.

Then, in Python, I wrote the following code:

model = Model(ggml_model="ggml-model-q4_0.bin", n_ctx=512)

generated_text = model.generate("Once upon a time ", n_predict=50)

However, upon running the code I am given the following output.

llama_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 6656
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 52
llama_model_load: n_layer = 60
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 17920
llama_model_load: n_parts = 4
llama_model_load: type    = 3
llama_model_load: ggml map size = 19391.80 MB
llama_model_load: ggml ctx size = 151.25 KB
llama_model_load: mem required  = 21695.95 MB (+ 6248.00 MB per state)
llama_model_load: loading tensors from 'ggml-model-q4_0.bin'
Traceback (most recent call last):
  File "/mnt/externaldrive/code/personal/python/copilot/__main__.py", line 8, in <module>
    model = Model(ggml_model="ggml-model-q4_0.bin", n_ctx=512)
  File "/home/kruz/.local/lib/python3.10/site-packages/pyllamacpp/model.py", line 60, in __init__
    self._ctx = pp.llama_init_from_file(ggml_model, self.llama_params)
ValueError: basic_string::_S_create

I'm not sure if support for LLaMa was ever a plan, but any help would be appreciated!

Answered by VoxanyNet

Apr 13, 2023

I'm sorry I just found the problem. I was using the 30B model which requires over 20GB of memory. I have 32GB on my system but I believe there wasn't enough left over. Using the 13B model works for me.

Thanks for the response!

View full answer

abdeladim-s · 2023-04-13T21:42:23Z

abdeladim-s
Apr 13, 2023
Maintainer

Hi @VoxanyNet,

If it the model is converted correctly, it should be working I guess.

Could you please try to run the model with llama.cpp first and check if it working ?

1 reply

VoxanyNet Apr 13, 2023
Author

I'm sorry I just found the problem. I was using the 30B model which requires over 20GB of memory. I have 32GB on my system but I believe there wasn't enough left over. Using the 13B model works for me.

Thanks for the response!

Answer selected by VoxanyNet

afxlab1 · 2023-04-14T10:34:37Z

afxlab1
Apr 14, 2023

Hi, for some reason it crashes am i doing something wrong?
model = Model(ggml_model = 'C:/Program Files/gpt4all 0.1.0/bin/ggml-gpt4all-j.bin', n_ctx=512)
response = model.generate("Once upon a time, ", n_predict=55)

1 reply

VoxanyNet Apr 14, 2023
Author

Could you give us the error it gives you?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Facebook LLaMa model with pyllamacpp? #59

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Use Facebook LLaMa model with pyllamacpp? #59

VoxanyNet Apr 13, 2023

Replies: 2 comments · 2 replies

abdeladim-s Apr 13, 2023 Maintainer

VoxanyNet Apr 13, 2023 Author

afxlab1 Apr 14, 2023

VoxanyNet Apr 14, 2023 Author

VoxanyNet
Apr 13, 2023

Replies: 2 comments 2 replies

abdeladim-s
Apr 13, 2023
Maintainer

VoxanyNet Apr 13, 2023
Author

afxlab1
Apr 14, 2023

VoxanyNet Apr 14, 2023
Author