use other pdf will raise error #5

alexhmyang · 2023-06-07T04:57:06Z

File "/home/ubuntu/.local/lib/python3.8/site-packages/llama_cpp/llama.py", line 506, in _create_completion
prompt_tokens: List[llama_cpp.llama_token] = self.tokenize(
File "/home/ubuntu/.local/lib/python3.8/site-packages/llama_cpp/llama.py", line 189, in tokenize
raise RuntimeError(f'Failed to tokenize: text="{text}" n_tokens={n_tokens}')
RuntimeError: Failed to tokenize: text="b" ### Human:Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n\xe6\xa8\xaa\xe5\xba\x97\xe9\x9b\x86\xe5\x9b\xa2\xe4\xb8\x9c\xe7\xa3\x81\xe8\x82\xa1\xe4\xbb\xbd\xe6\x9c\x89\xe9\x99\x90\xe5\x85\xac\xe5\x8f\xb8 \n \n \n \n1

and use your pdf cannot generate answer or too slow to generate:

AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
llama.cpp: loading model from ./ggml-vicuna-13b-1.1-q4_2.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 5 (mostly Q4_2)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 85.08 KB
llama_model_load_internal: mem required = 9807.48 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 1600.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Token indices sequence length is longer than the specified maximum sequence length for this model (1104 > 1024). Running this sequence through the model will result in indexing errors

always waiting here

wafflecomposite · 2023-06-07T05:31:07Z

Seems like the main problem is the exceeded context length. First, try to edit those lines in app.py:

Line 59: try lower values for chunk_size and chunk_overlap. Like 800 and 150, for example.

If that doesn't help:

Line 78: lower the k value from 4 to 3 (this is a number of retrieved text chunks)

By those logs I'm also assuming you are using Chinese. I haven't tested if it even works, and I expect the model to be even slower than usual with it, and the quality of the results probably be poor.
Perhaps some other LLM models with more emphasis on multi-language or specifically Chinese would be better suited for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use other pdf will raise error #5

use other pdf will raise error #5

alexhmyang commented Jun 7, 2023

wafflecomposite commented Jun 7, 2023

use other pdf will raise error #5

use other pdf will raise error #5

Comments

alexhmyang commented Jun 7, 2023

wafflecomposite commented Jun 7, 2023