Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use other pdf will raise error #5

Open
alexhmyang opened this issue Jun 7, 2023 · 1 comment
Open

use other pdf will raise error #5

alexhmyang opened this issue Jun 7, 2023 · 1 comment

Comments

@alexhmyang
Copy link

File "/home/ubuntu/.local/lib/python3.8/site-packages/llama_cpp/llama.py", line 506, in _create_completion
prompt_tokens: List[llama_cpp.llama_token] = self.tokenize(
File "/home/ubuntu/.local/lib/python3.8/site-packages/llama_cpp/llama.py", line 189, in tokenize
raise RuntimeError(f'Failed to tokenize: text="{text}" n_tokens={n_tokens}')
RuntimeError: Failed to tokenize: text="b" ### Human:Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n\xe6\xa8\xaa\xe5\xba\x97\xe9\x9b\x86\xe5\x9b\xa2\xe4\xb8\x9c\xe7\xa3\x81\xe8\x82\xa1\xe4\xbb\xbd\xe6\x9c\x89\xe9\x99\x90\xe5\x85\xac\xe5\x8f\xb8 \n \n \n \n1

and use your pdf cannot generate answer or too slow to generate:

AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
llama.cpp: loading model from ./ggml-vicuna-13b-1.1-q4_2.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 5 (mostly Q4_2)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 85.08 KB
llama_model_load_internal: mem required = 9807.48 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 1600.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Token indices sequence length is longer than the specified maximum sequence length for this model (1104 > 1024). Running this sequence through the model will result in indexing errors

always waiting here

@wafflecomposite
Copy link
Owner

Seems like the main problem is the exceeded context length. First, try to edit those lines in app.py:

Line 59: try lower values for chunk_size and chunk_overlap. Like 800 and 150, for example.

If that doesn't help:

Line 78: lower the k value from 4 to 3 (this is a number of retrieved text chunks)

By those logs I'm also assuming you are using Chinese. I haven't tested if it even works, and I expect the model to be even slower than usual with it, and the quality of the results probably be poor.
Perhaps some other LLM models with more emphasis on multi-language or specifically Chinese would be better suited for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants