-
Hello! I want to use the Facebook LLaMA model with pyllamacpp and am looking for some help. I followed the instructions on the llama.cpp README to convert the model to the ggml format and quantize it. I am using the 30B model. Then, in Python, I wrote the following code: model = Model(ggml_model="ggml-model-q4_0.bin", n_ctx=512)
generated_text = model.generate("Once upon a time ", n_predict=50) However, upon running the code I am given the following output.
I'm not sure if support for LLaMa was ever a plan, but any help would be appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi @VoxanyNet, If it the model is converted correctly, it should be working I guess. Could you please try to run the model with |
Beta Was this translation helpful? Give feedback.
-
Hi, for some reason it crashes am i doing something wrong? |
Beta Was this translation helpful? Give feedback.
I'm sorry I just found the problem. I was using the 30B model which requires over 20GB of memory. I have 32GB on my system but I believe there wasn't enough left over. Using the 13B model works for me.
Thanks for the response!