You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure if this is meant to work at present, but I got a RuntimeError: Internal: src/sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())] loading https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ/, after cloning it and doing ln -s https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ/blob/main/WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors 4bit-128g.compat.no-act-order.safetensors.
This is with:
$ git show HEAD
commit 0a7113f99780abb15e9a058a7a8501767e54940a (HEAD -> latestgptq, origin/latestgptq, origin/HEAD)
Merge: b8e8b0f 530e204
Author: 0cc4m <[email protected]>
Date: Wed May 24 06:32:54 2023 +0200
Merge pull request #35 from YellowRoseCx/patch-1
Update README.md to GPTQ-KoboldAI 0.0.5
$ git remote -v
origin https://github.com/0cc4m/KoboldAI (fetch)
origin https://github.com/0cc4m/KoboldAI (push)
Colab Check: False, TPU: False
INFO | __main__:general_startup:1312 - Running on Repo: https://github.com/0cc4m/KoboldAI Branch: latestgptq
INIT | Starting | Flask
INIT | OK | Flask
INIT | Starting | Webserver
INIT | Starting | LUA bridge
INIT | OK | LUA bridge
INIT | Starting | LUA Scripts
INIT | OK | LUA Scripts
Setting Seed
INIT | OK | Webserver
MESSAGE | Webserver started! You may now connect with a browser at http://127.0.0.1:5000
Connection Attempt: 127.0.0.1
INFO | __main__:do_connect:2805 - Client connected! UI_1
Connection Attempt: 127.0.0.1
INFO | __main__:do_connect:2805 - Client connected! UI_1
ERROR | koboldai_settings:__setattr__:1210 - __setattr__ just set model_selected to NeoCustom in koboldai_vars. That variable isn't defined!
INFO | __main__:get_model_info:1513 - Selected: NeoCustom, /home/lb/GIT/KoboldAI/models/TheBloke_WizardLM-7B-uncensored-GPTQ
INIT | Searching | GPU support
INIT | Found | GPU support
INIT | Starting | Transformers
INIT | Info | Final device configuration:
DEVICE ID | LAYERS | DEVICE NAME
(primary) 0 | 32 | NVIDIA GeForce RTX 3090
1 | 0 | Tesla P40
2 | 0 | Tesla P40
N/A | 0 | (Disk cache)
N/A | 0 | (CPU)
INFO | modeling.inference_models.hf_torch_4bit:_get_model:371 - Using GPTQ file: /home/lb/GIT/KoboldAI/models/TheBloke_WizardLM-7B-uncensored-GPTQ/4bit-128g.safetensors, 4-bit model, type llama, version 2, groupsize 128
Loading model ...
Done.
Exception in thread Thread-18:
Traceback (most recent call last):
File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/socketio/server.py", line 731, in _handle_event_internal
r = server._trigger_event(data[0], namespace, sid, *data[1:])
File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/socketio/server.py", line 756, in _trigger_event
return self.handlers[namespace][event](*args)
File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/flask_socketio/__init__.py", line 282, in _handler
return self._handle_event(handler, message, namespace, sid,
File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/flask_socketio/__init__.py", line 828, in _handle_event
ret = handler(*args)
File "aiserver.py", line 615, in g
return f(*a, **k)
File "aiserver.py", line 3191, in get_message
load_model(use_gpu=msg['use_gpu'], gpu_layers=msg['gpu_layers'], disk_layers=msg['disk_layers'], online_model=msg['online_model'])
File "aiserver.py", line 1980, in load_model
model.load(
File "/home/lb/GIT/KoboldAI/modeling/inference_model.py", line 177, in load
self._load(save_model=save_model, initial_load=initial_load)
File "/home/lb/GIT/KoboldAI/modeling/inference_models/hf_torch_4bit.py", line 199, in _load
self.tokenizer = self._get_tokenizer(self.get_local_model_path())
File "/home/lb/GIT/KoboldAI/modeling/inference_models/hf_torch_4bit.py", line 391, in _get_tokenizer
tokenizer = LlamaTokenizer.from_pretrained(utils.koboldai_vars.custmodpth)
File "aiserver.py", line 112, in new_pretrainedtokenizerbase_from_pretrained
tokenizer = old_pretrainedtokenizerbase_from_pretrained(cls, *args, **kwargs)
File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1811, in from_pretrained
return cls._from_pretrained(
File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1965, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in __init__
self.sp_model.Load(vocab_file)
File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/sentencepiece/__init__.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
The text was updated successfully, but these errors were encountered:
Hmm. Also fails on TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g, with the same error. Yet, the same download/loading process works fine on other 4bit 128g safetensor gptq models, like MetalX_GPT4-X-Alpasta-30b-4bit. Am I missing something here?
Ah, I think this is the same as #11 , but not convinced that the issue is model-side as that ticket says? Maybe something in the loading needs to parse the model config.json differently, for instance?
OK, so at least for TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g, it only works if I download both the .pt and the .safetensors. I think it should try the .safetensors first, and only look for a .pt if .safetensors aren't available, right?
Not sure if this is meant to work at present, but I got a
RuntimeError: Internal: src/sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
loadinghttps://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ/
, after cloning it and doingln -s https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ/blob/main/WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors 4bit-128g.compat.no-act-order.safetensors
.This is with:
The text was updated successfully, but these errors were encountered: