Error Occurs After Asking Consecutive Questions in LLM-Chatbot #2421

tim102187S · 2024-09-27T06:37:12Z

I am using OpenVINO 2024.4.0 and have downloaded the llama-3-8b-instruct model for use. When I run multiple consecutive queries (usually on the third query), an error occurs. I have checked my device’s memory usage, and it has not exceeded 100%.

Here is the error report I received:

Selected model llama-3-8b-instruct
Checkbox(value=True, description='Prepare INT4 model')
Checkbox(value=False, description='Prepare INT8 model')
Checkbox(value=False, description='Prepare FP16 model')
Size of model with INT4 compressed weights is 5085.79 MB
Loading model from /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama-3-8b-instruct/INT4_compressed_weights
Compiling the model to CPU ...
Running on local URL: http://127.0.0.1:7861

To create a public link, set share=True in launch().
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:128001 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:128001 for open-end generation.
Traceback (most recent call last):
File "/home/adv/openvino-llm/lib/python3.12/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/openvino-llm/lib/python3.12/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/openvino-llm/lib/python3.12/site-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/openvino-llm/lib/python3.12/site-packages/gradio/blocks.py", line 1532, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/openvino-llm/lib/python3.12/site-packages/gradio/utils.py", line 671, in async_iteration
return await iterator.anext()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/openvino-llm/lib/python3.12/site-packages/gradio/utils.py", line 664, in anext
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/openvino-llm/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/openvino-llm/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2405, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/adv/openvino-llm/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 914, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/openvino-llm/lib/python3.12/site-packages/gradio/utils.py", line 647, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/home/adv/openvino-llm/lib/python3.12/site-packages/gradio/utils.py", line 809, in gen_wrapper
response = next(iterator)
^^^^^^^^^^^^^^
File "/home/adv/Downloads/EAS_GenAI_Intel14th/docker_build/llm_chatbot/run_chatbot.py", line 532, in bot
for new_text in streamer:
File "/home/adv/openvino-llm/lib/python3.12/site-packages/transformers/generation/streamers.py", line 223, in next
value = self.text_queue.get(timeout=self.timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/queue.py", line 179, in get
raise Empty
_queue.Empty

The text was updated successfully, but these errors were encountered:

brmarkus · 2024-09-27T06:52:21Z

Are you talking about "https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot" (where llama-3-8b-instruct is mentioned) or "https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering" (where e.g. tiny-llama-1b-chat is mentioned)?

Can you provide more details about your system, please (SoC, amount of memory, OS, version of Python, etc.)?

Can you provide example prompts, please?

tim102187S · 2024-09-27T07:08:24Z

Thank you for your response.

I am using the model (llama-3-8b-instruct) and code from this project: https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot/llm-chatbot.ipynb

Here are my system details:

OS: Ubuntu 24.04
Memory: 32GB
CPU: Intel(R) CoreTM Ultra 7 165U
Python version: 3.12.3

The prompts I am using also come from the examples in the project, such as:

"hello there! How are you doing?"
"What is OpenVINO?"
"Who are you?"
"Can you explain to me briefly what is Python programming language?"
etc.

Please let me know if you need any further information.

brmarkus · 2024-09-27T07:28:48Z

Have you seen errors or warnings in the steps for conversion and compression?

Do you see the same when using the INT8 or FP16 variant instead of the INT4 variant?

Do you start the Jupyter-notebook from within an virtual-environment (with a "guaranteed" set of versions of components), or "global-local", using the components installed globally on your local machine)?

Do you use a specific version or branch of the OpenVINO-Notebooks repo, or the "latest head revision"?

When running under MS-Win11 with the latest version I can query multiple prompts without problems using the INT4 model... (but my Laptop has 64MB of RAM, Core-Ultra-7-155H)

tim102187S · 2024-09-27T07:49:14Z

Thank you for your suggestions.

I did not see any errors or warnings during the model conversion and compression steps.

We have not yet tried using the non-INT4 variants, as the focus of our research project is primarily on INT4 models.

We are running the Jupyter notebook in a Python virtual environment and following the steps outlined in the llm-chatbot.ipynb notebook.

This research project requires the use of the Ubuntu 24.04 system, so we are hoping to resolve the issue within this setup. (During the execution of the chatbot, the memory usage is approximately 7GB, so the errors are not due to insufficient memory.)

brmarkus · 2024-09-27T09:10:07Z

For conversion and compression I would anyway expect the OperatingSystem to start swapping memory to HDD/SSD if the system memory is not big enough...

Let's see if someone else can reproduce it under a similar environment... sorry, I don't see your described problems.
Have you modified the code or the model?

Can you reproduce it with another model?

tim102187S · 2024-09-30T02:28:06Z

Thank you for your follow-up.

I have also tried using the llama-2-7b-chatbot model with INT4, INT8, and FP16, and I encountered the same issue in all cases.

Additionally, I would like to clarify that I have not made any modifications to the code or the model.

aleksandr-mokrov · 2024-09-30T18:17:30Z

@tim102187S, looks like it due to 30 seconds timeout. Could you try to increase the value or to delete it at all in this row and check:
streamer = TextIteratorStreamer(tok, timeout=30.0, skip_prompt=True, skip_special_tokens=True)

andrei-kochin mentioned this issue Oct 2, 2024

Chatbot timeout increasing #2428

Merged

andrei-kochin closed this as completed Oct 2, 2024

andrei-kochin reopened this Oct 2, 2024

avitial added the bug Something isn't working label Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error Occurs After Asking Consecutive Questions in LLM-Chatbot #2421

Error Occurs After Asking Consecutive Questions in LLM-Chatbot #2421

tim102187S commented Sep 27, 2024

brmarkus commented Sep 27, 2024 •

edited

Loading

tim102187S commented Sep 27, 2024 •

edited

Loading

brmarkus commented Sep 27, 2024

tim102187S commented Sep 27, 2024 •

edited

Loading

brmarkus commented Sep 27, 2024

tim102187S commented Sep 30, 2024

aleksandr-mokrov commented Sep 30, 2024

Error Occurs After Asking Consecutive Questions in LLM-Chatbot #2421

Error Occurs After Asking Consecutive Questions in LLM-Chatbot #2421

Comments

tim102187S commented Sep 27, 2024

brmarkus commented Sep 27, 2024 • edited Loading

tim102187S commented Sep 27, 2024 • edited Loading

brmarkus commented Sep 27, 2024

tim102187S commented Sep 27, 2024 • edited Loading

brmarkus commented Sep 27, 2024

tim102187S commented Sep 30, 2024

aleksandr-mokrov commented Sep 30, 2024

brmarkus commented Sep 27, 2024 •

edited

Loading

tim102187S commented Sep 27, 2024 •

edited

Loading

tim102187S commented Sep 27, 2024 •

edited

Loading