Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to load 4bit-128g WizardLM 7B #36

Open
lee-b opened this issue May 24, 2023 · 3 comments
Open

Failed to load 4bit-128g WizardLM 7B #36

lee-b opened this issue May 24, 2023 · 3 comments

Comments

@lee-b
Copy link

lee-b commented May 24, 2023

Not sure if this is meant to work at present, but I got a RuntimeError: Internal: src/sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())] loading https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ/, after cloning it and doing ln -s https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ/blob/main/WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors 4bit-128g.compat.no-act-order.safetensors.

This is with:

$ git show HEAD
commit 0a7113f99780abb15e9a058a7a8501767e54940a (HEAD -> latestgptq, origin/latestgptq, origin/HEAD)
Merge: b8e8b0f 530e204
Author: 0cc4m <[email protected]>
Date:   Wed May 24 06:32:54 2023 +0200

    Merge pull request #35 from YellowRoseCx/patch-1
    
    Update README.md to GPTQ-KoboldAI 0.0.5

$ git remote -v
origin	https://github.com/0cc4m/KoboldAI (fetch)
origin	https://github.com/0cc4m/KoboldAI (push)
Colab Check: False, TPU: False
INFO       | __main__:general_startup:1312 - Running on Repo: https://github.com/0cc4m/KoboldAI Branch: latestgptq
INIT       | Starting   | Flask
INIT       | OK         | Flask
INIT       | Starting   | Webserver
INIT       | Starting   | LUA bridge
INIT       | OK         | LUA bridge
INIT       | Starting   | LUA Scripts
INIT       | OK         | LUA Scripts
Setting Seed
INIT       | OK         | Webserver
MESSAGE    | Webserver started! You may now connect with a browser at http://127.0.0.1:5000
Connection Attempt: 127.0.0.1
INFO       | __main__:do_connect:2805 - Client connected! UI_1
Connection Attempt: 127.0.0.1
INFO       | __main__:do_connect:2805 - Client connected! UI_1
ERROR      | koboldai_settings:__setattr__:1210 - __setattr__ just set model_selected to NeoCustom in koboldai_vars. That variable isn't defined!
INFO       | __main__:get_model_info:1513 - Selected: NeoCustom, /home/lb/GIT/KoboldAI/models/TheBloke_WizardLM-7B-uncensored-GPTQ
INIT       | Searching  | GPU support
INIT       | Found      | GPU support
INIT       | Starting   | Transformers
INIT       | Info       | Final device configuration:
       DEVICE ID  |  LAYERS  |  DEVICE NAME
   (primary)   0  |      32  |  NVIDIA GeForce RTX 3090
               1  |       0  |  Tesla P40
               2  |       0  |  Tesla P40
             N/A  |       0  |  (Disk cache)
             N/A  |       0  |  (CPU)
INFO       | modeling.inference_models.hf_torch_4bit:_get_model:371 - Using GPTQ file: /home/lb/GIT/KoboldAI/models/TheBloke_WizardLM-7B-uncensored-GPTQ/4bit-128g.safetensors, 4-bit model, type llama, version 2, groupsize 128
Loading model ...
Done.
Exception in thread Thread-18:
Traceback (most recent call last):
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/socketio/server.py", line 731, in _handle_event_internal
    r = server._trigger_event(data[0], namespace, sid, *data[1:])
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/socketio/server.py", line 756, in _trigger_event
    return self.handlers[namespace][event](*args)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/flask_socketio/__init__.py", line 282, in _handler
    return self._handle_event(handler, message, namespace, sid,
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/flask_socketio/__init__.py", line 828, in _handle_event
    ret = handler(*args)
  File "aiserver.py", line 615, in g
    return f(*a, **k)
  File "aiserver.py", line 3191, in get_message
    load_model(use_gpu=msg['use_gpu'], gpu_layers=msg['gpu_layers'], disk_layers=msg['disk_layers'], online_model=msg['online_model'])
  File "aiserver.py", line 1980, in load_model
    model.load(
  File "/home/lb/GIT/KoboldAI/modeling/inference_model.py", line 177, in load
    self._load(save_model=save_model, initial_load=initial_load)
  File "/home/lb/GIT/KoboldAI/modeling/inference_models/hf_torch_4bit.py", line 199, in _load
    self.tokenizer = self._get_tokenizer(self.get_local_model_path())
  File "/home/lb/GIT/KoboldAI/modeling/inference_models/hf_torch_4bit.py", line 391, in _get_tokenizer
    tokenizer = LlamaTokenizer.from_pretrained(utils.koboldai_vars.custmodpth)
  File "aiserver.py", line 112, in new_pretrainedtokenizerbase_from_pretrained
    tokenizer = old_pretrainedtokenizerbase_from_pretrained(cls, *args, **kwargs)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1811, in from_pretrained
    return cls._from_pretrained(
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1965, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in __init__
    self.sp_model.Load(vocab_file)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
@lee-b
Copy link
Author

lee-b commented May 24, 2023

Hmm. Also fails on TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g, with the same error. Yet, the same download/loading process works fine on other 4bit 128g safetensor gptq models, like MetalX_GPT4-X-Alpasta-30b-4bit. Am I missing something here?

@lee-b
Copy link
Author

lee-b commented May 24, 2023

Ah, I think this is the same as #11 , but not convinced that the issue is model-side as that ticket says? Maybe something in the loading needs to parse the model config.json differently, for instance?

@lee-b
Copy link
Author

lee-b commented May 24, 2023

OK, so at least for TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g, it only works if I download both the .pt and the .safetensors. I think it should try the .safetensors first, and only look for a .pt if .safetensors aren't available, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant