-
Notifications
You must be signed in to change notification settings - Fork 161
Very slow generation with gpt4all #50
Comments
Having the same problem over here. Mac M1, 8GB RAM. Chat works really fast, like in the gif in the README, but pyllamacpp painfully slow. Also, very different output, with lower quality. Might have to do with the new ggml weights (#40)? Tried with both the directly downloaded gpt4all-lora-quantized-ggml.bin and with converting gpt4all-lora-quantized.bin myself. |
Gpt4all binary is based on an old commit of It might be that you need to build the package yourself, because the build process is taking into account the target CPU, or as @clauslang said, it might be related to the new ggml format, people are reporting similar issues there. So, What you have to do it to build |
Thanks @abdeladim-s. Not 100% sure if that's what you mean by building
Then I tried building
and ran the sample script:
--> very slow and none or poor output |
I have found that decreasing the threads from 8 as a default to 1 doubles the gegneration speed. No idea why but it seems to work. I am trying to get it to go even faster. Ill let you know if I have updates. |
are you sure it doubles it? threads refers to cpu cores |
Batchsize is the most important thing for speed, don't do too much and don't do too less. |
I found out what the relation is. Threads cant be more than the number shown in system info, otherwise it becomes REALLY slow. Dont know why since this is easily preventable |
it's still as lame as a turtle |
Increasing thread count may cause it to include efficiency cores. For me, changing from 8 to 6 for M1 Pro with 6 performance cores fixed it. |
Works fine on mine @abdeladim-s, so I'm not of much help for that issue. But hopefully @shivam-singhai's response indicates that the package manager version is the culprit. |
No problem @mattorp. |
I am having the same problem (gpt4all-lora-quantized-OSX-m1) is very fast (< 1 sec) on my mac. However running with pyllamacpp is very slow. Typical queries with pyllamacpp take > 30 sec. Tried the couple of things suggested above but that didn't change the response time. |
@bsbhaskartp, if it is slow then you just need to build it from source. |
Thanks. Will try it out
…On Tue, May 2, 2023 at 2:07 PM Abdeladim Sadiki ***@***.***> wrote:
@bsbhaskartp <https://github.com/bsbhaskartp>, if it is slow then you
just need to build it from source.
@Naugustogi <https://github.com/Naugustogi> was having the same issue and
he succeed to solve it. Please take a look at this issue
<abdeladim-s/pyllamacpp#3>, it might help
—
Reply to this email directly, view it on GitHub
<#50 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AL6MFS5CKKN4VV5HTZKJ5NDXEFZSXANCNFSM6AAAAAAWYIE6SY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Using gpt4all through the file in the attached image:
works really well and it is very fast, eventhough I am running on a laptop with linux mint. About 0.2 seconds per token. But when running gpt4all through pyllamacpp, it takes up to 10 seconds for one token to generate. Why is that, and how do i speed it up?
The text was updated successfully, but these errors were encountered: