You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been trying to make my own generative agents as a pet project to learn, and so far I had been using oobabooga api to make calls to a llama 2 13B model by TheBloke using exllama. However I found an issue with oobabooga freezing after a few requests, so I began to look for alternatives and LocalAI looks to be just what I need 👍 .
So I'm trying to start LocalAI to run (preferably) the same model, but I have been unable to succeed so far. I'll detail what I have been doing so far and hopefully someone might point me in the right direction!
(But I did not clone the project, I'm not sure why it is needed if you just want to run the docker image?)
I ran the docker compose from IntelliJ idea and the container loaded, rebuilt, and gave a successful response when accessing the list of available models.
I tried looking for a llama 2 model in the gallery, but I only found ggml models, none that could run only in gpu with exllama. I tried one of those anyways (had to download manually because download was very slow btw) and got problems with not enough RAM when trying to run it. The container has access to 16GB RAM, but it does not seem it is offloading anything to the gpu, there seems to already be an open ticket for that.
I also saw there has recently been exllama support added, so I decided to try it as well. ( feat: Add exllama #881 )
So according to the MR instructions, I downloaded the model into a subfolder of /models/ and in the .yaml folder i have:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi community,
I've been trying to make my own generative agents as a pet project to learn, and so far I had been using oobabooga api to make calls to a llama 2 13B model by TheBloke using exllama. However I found an issue with oobabooga freezing after a few requests, so I began to look for alternatives and LocalAI looks to be just what I need 👍 .
So I'm trying to start LocalAI to run (preferably) the same model, but I have been unable to succeed so far. I'll detail what I have been doing so far and hopefully someone might point me in the right direction!
What I tried so far:
However, when the docker image starts after rebuilding, and I make a request to completions, I see the following exllama.py error with the imports:
It sounds like wrong cuda version in the docker container? I've been approaching this from a top-down approach, so I don't know much about cuda 😅 .
Beta Was this translation helpful? Give feedback.
All reactions