Skip to content
This repository has been archived by the owner on May 12, 2023. It is now read-only.

Errors loading ggml models #107

Open
NasonZ opened this issue May 10, 2023 · 3 comments
Open

Errors loading ggml models #107

NasonZ opened this issue May 10, 2023 · 3 comments

Comments

@NasonZ
Copy link

NasonZ commented May 10, 2023

Hello,
I'm just starting to explore the models made available by gpt4all but I'm having trouble loading a few models.

My environment details:

  • Ubuntu==22.04
  • Python==3.10.10
  • pygpt4all==1.1.0.
  • llama-cpp-python==0.1.48

Code to reproduce error (vicuna_test.py):

from pygpt4all.models.gpt4all import GPT4All
model = GPT4All('./models/ggml-vicuna-7b-1.1-q4_2.bin') #or any of the other model

Issue:

The issue is that I can't seem to load some of the models listed here - https://github.com/nomic-ai/gpt4all-chat#manual-download-of-models. The models I've failed to load are:

  • ggml-gpt4all-j.bin
  • ggml-gpt4all-j-v1.3-groovy.bin
  • ggml-vicuna-7b-1.1-q4_2.bin
  • ggml-stable-vicuna-13B.q4_2.bin models.

As shown below, the ggml-gpt4all-l13b-snoozy.bin model loads without issue. I also managed to load this version - https://huggingface.co/mrgaang/aira/blob/main/gpt4all-converted.bin.

# Working example - ggml-gpt4all-l13b-snoozy.bin

$ python vicuna_test.py 
llama_model_load: loading model from './models/ggml-gpt4all-l13b-snoozy.bin' - please wait ...
llama_model_load: n_vocab = 32000
...
llama_model_load: ggml ctx size = 101.25 KB
llama_model_load: mem required  = 9807.93 MB (+ 3216.00 MB per state)
llama_model_load: loading tensors from './models/ggml-gpt4all-l13b-snoozy.bin'
llama_model_load: model size =  7759.39 MB / num tensors = 363
llama_init_from_file: kv self size  =  800.00 MB
#loads without error 

Errors loading the listed models:

# gpt4all-j-v1.3-groovy

$ python vicuna_test.py 
llama_model_load: loading model from './models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
llama_model_load: invalid model file './models/ggml-gpt4all-j-v1.3-groovy.bin' (too old, regenerate your model files or convert them with convert-unversioned-ggml-to-ggml.py!)
llama_init_from_file: failed to load model
Segmentation fault (core dumped)
# vicuna-7b

$ python vicuna_test.py 
llama_model_load: loading model from './models/ggml-vicuna-7b-1.1-q4_2.bin' - please wait ...
llama_model_load: n_vocab = 32000
...
llama_model_load: type    = 1
llama_model_load: invalid model file './models/ggml-vicuna-7b-1.1-q4_2.bin' (bad f16 value 5)
llama_init_from_file: failed to load model
Segmentation fault (core dumped)

#ggml-gpt4all-j.bin and ggml-stable-vicuna-13B.q4_2.bin models produced the same error

Can anyone please advise on how I resolve these issues?

@NasonZ
Copy link
Author

NasonZ commented May 10, 2023

I'm pretty sure the issue is with GPT4All as I can load all models mentioned with:

from llama_cpp import Llama
model_path = "models/ggml-vicuna-7b-1.1-q4_2.bin"
model = Llama(model_path=model_path)

Are there any major differences from loading the model through Llama instead of GPT4All?

@385olt
Copy link

385olt commented May 10, 2023

I have the same issue.

My environment:
ubuntu 22.04
python 3.10.6
pygpt4all 1.1.0
pygptj 2.0.3
pyllamacpp 2.1.3

Code:
model = GPT4All('./ggml-mpt-7b-chat.bin', prompt_context = "The following is a conversation between Jim and Bob. Bob is trying to help Jim with his requests by answering the questions to the best of his abilities. If Bob cannot help Jim, then he says that he doesn't know.")

I can load ggml-gpt4all-l13b-snoozy.bin and https://huggingface.co/mrgaang/aira/blob/main/gpt4all-converted.bin and they work fine, but the following models fail to load:
-- ggml-mpt-7b-chat.bin
-- ggml-vicuna-7b-1.1-q4_2.bin
-- ggml-wizardLM-7b.q4_2.bin

Loading ggml-wizardLM-7b.q4_2.bin and ggml-vicuna-7b-1.1-q4_2.bin gives

llama_model_load: invalid model file './ggml-wizardLM-7b.q4_2.bin' (bad f16 value 5)
llama_init_from_file: failed to load model

Loading ggml-mpt-7b-chat.bin gives

./ggml-mpt-7b-chat.bin: invalid model file (bad magic [got 0x67676d6d want 0x67676a74])
	you most likely need to regenerate your ggml files
	the benefit is you'll get 10-100x faster load times
	see https://github.com/ggerganov/llama.cpp/issues/91
	use convert-pth-to-ggml.py to regenerate from original pth
	use migrate-ggml-2023-03-30-pr613.py if you deleted originals
llama_init_from_file: failed to load model

I tried using llama.cpp/migrate-ggml-2023-03-30-pr613.py on ggml-mpt-7b-chat.bin, but got the error:

File "/.../llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 133, in read_tokens
    word = fin.read(length)
ValueError: read length must be non-negative or -1

I tried using llama.cpp/convert-unversioned-ggml-to-ggml.py to fix this error, but got the error:

File "/.../llama.cpp/convert-unversioned-ggml-to-ggml.py", line 29, in write_header
    raise Exception('Invalid file magic. Must be an old style ggml file.')
Exception: Invalid file magic. Must be an old style ggml file.

I tried using llama.cpp/migrate-ggml-2023-03-30-pr613.py on ggml-wizardLM-7b.q4_2.bin, but got the message:
./ggml-wizardLM-7b.q4_2.bin: input ggml has already been converted to 'ggjt' magic

I have no idea how to fix this or why it happens.

@emilaz
Copy link

emilaz commented May 11, 2023

I'm having the same issue on

  • Windows 10
  • Python 3.11.3
  • pygpt4all 1.1.0

and models
gpt4all-lora-quantized.bin
ggml-gpt4all-j-v1.3-groovy.bin

all result in

llama_model_load: invalid model file './gpt4all-lora-quantized.bin' (too old, regenerate your model files or convert them with convert-unversioned-ggml-to-ggml.py!)
llama_init_from_file: failed to load model

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants