llava-multimodal-chatbot-genai run failed #2484

Johere · 2024-10-29T06:42:39Z

Running Jupyter notebook of llava model:
https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llava-multimodal-chatbot/llava-multimodal-chatbot-genai.ipynb

Device: Arc 770 dGPU
Precision: INT4
Model: llava-hf/llava-1.5-7b-hf

Describe the bug

RuntimeError: Exception from src/inference/src/cpp/core.cpp:90:
Check 'util::directory_exists(path) || util::file_exists(path)' failed at src/frontends/common/src/frontend.cpp:113:
FrontEnd API failed with GeneralFailure:
ir: Could not open the file: "llava-1.5-7b-hf/INT4/openvino_tokenizer.xml"

Expected behavior
No code changed, expect to work well.

Screenshots

The text was updated successfully, but these errors were encountered:

brmarkus · 2024-10-29T07:23:17Z

Which version of the OpenVINO-Notebooks do you use? There were changes in the last days:

https://github.com/openvinotoolkit/openvino_notebooks/commits/latest/notebooks/llava-multimodal-chatbot/llava-multimodal-chatbot-genai.ipynb

Can you temporarily remove all the -q (running quiet option) from all commands in the first cell, run the first cell and check if it was executed successfully?

UPDATE: I needed to delete my existing virt-env folder, create a new virtenv, resync the Notebooks repo, re-install the requirements.txt and then run the Jupyter-Notebook again.
However, the command optimum-cli export openvino --model llava-hf/llava-1.5-7b-hf llava-1.5-7b-hf\FP16 --weight-format fp16 is still running since more than 15 minutes...

Johere · 2024-10-29T07:57:31Z

I'm using the OpenVINO-notebook with commit-di: d5c6df43edebf273ccef512439aba11910aad633, branch: latest.
Yes, all the commands in the first cell were executed successfully.

I've pulled the latest changes on branch: latest, the issue still exists. Is this file(llava-1.5-7b-hf/INT4/openvino_tokenizer.xml) required to be existed? The files under INT4 folder is like:

INT4
├── added_tokens.json
├── chat_template.json
├── config.json
├── generation_config.json
├── openvino_language_model.bin
├── openvino_language_model.xml
├── openvino_text_embeddings_model.bin
├── openvino_text_embeddings_model.xml
├── openvino_vision_embeddings_model.bin
├── openvino_vision_embeddings_model.xml
├── preprocessor_config.json
├── processor_config.json
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── tokenizer.model

Thanks for your help!

brmarkus · 2024-10-29T08:03:22Z

Have a look into the originally downloaded FP16 folder llava-1.5-7b-hf/FP16. The pairs of XML&BIN files will be converted&compressed.
This could take VERY VERY long and could require LOTS OF LOTS of RAM. Are you sure the conversion step has already finished, and finished successfully?
It's still running on my machine since more than 30 minutes - and almost all of my 64GB RAM memory is used during conversion... still running...

Johere · 2024-10-29T08:10:00Z

Have a look into the originally downloaded FP16 folder llava-1.5-7b-hf/FP16. The pairs of XML&BIN files will be converted&compressed. This could take VERY VERY long and could require LOTS OF LOTS of RAM. Are you sure the conversion step has already finished, and finished successfully? It's still running on my machine since more than 30 minutes - and almost all of my 64GB RAM memory is used during conversion... still running...

Yes, It works fine in my environment. I think the size of model files are expected:

$ du -sh llava-1.5-7b-hf/*
14G     llava-1.5-7b-hf/FP16
4.1G    llava-1.5-7b-hf/INT4

brmarkus · 2024-10-29T08:15:41Z

Conversion and compression now has finished on my machine.
My INT4 folder looks like this:

=> yes, the file openvino_tokenizer.xml and openvino_tokenizer.bin should exist...
the files openvino_detokenizer.bin and openvino_detokenizer.xml are also missing on your side.

Are the files present in the original FP16 folder?

Start the commands again and watch CPU- and RAM-usage... it will take VERY long and will use LOTS of RAM and CPU-usage.

Johere · 2024-10-29T08:41:13Z

Ohh I don't have file openvino_tokenizer.xml and openvino_tokenizer.bin in FP16 folder. Removed this folder and regenerated using optimum-cli, now I can see these files.

It works well now, thank you so much for the help!

Johere · 2024-11-04T14:47:25Z

Hi @brmarkus I reopened this issue because I think I'm getting the wrong answer using this example: llava-multimodal-chatbot-genai.ipynb.

Using the branch latest, commit-id: dab21db

Screenshot attached here:

Thanks!

brmarkus · 2024-11-04T15:33:33Z

Hmm, download, conversion and compression is done here:

from cmd_helper import optimum_cli

model_id = "llava-hf/llava-1.5-7b-hf"
model_path = Path(model_id.split("/")[-1]) / "FP16"

if not model_path.exists():
optimum_cli(model_id, model_path, additional_args={"weight-format": "fp16"})

There is no version information given - so it could happen that in Huggingface a newer/different/updated/modified model gets updated/upgraded... and then the same query could result in a different response (besides SW-/HW-/platform-specific differences like rounding effects, HW-driver differences could result in different optimizations).

The response, however, sounds "reasonable"... no "hallucination"...

Johere · 2024-11-05T02:17:53Z

Hmm, download, conversion and compression is done here:

from cmd_helper import optimum_cli
model_id = "llava-hf/llava-1.5-7b-hf"
model_path = Path(model_id.split("/")[-1]) / "FP16"
if not model_path.exists():
optimum_cli(model_id, model_path, additional_args={"weight-format": "fp16"})

There is no version information given - so it could happen that in Huggingface a newer/different/updated/modified model gets updated/upgraded... and then the same query could result in a different response (besides SW-/HW-/platform-specific differences like rounding effects, HW-driver differences could result in different optimizations).

The response, however, sounds "reasonable"... no "hallucination"...

But if I run with llava-multimodal-chatbot-optimum.ipynb, I can get the reasonable answer:

Do you get the same answer as mine for llava-multimodal-chatbot-genai.ipynb?

brmarkus · 2024-11-05T07:21:59Z

Maybe the picture showing an answer regarding a cat was captured using llava-multimodal-chatbot-optimum.ipynb instead of using llava-multimodal-chatbot-genai.ipynb.
Do you get the same results when using different accelerators (CPU, GPU, NPU, AUTO, MULTI)?
Different results when using the original model or quantized versions (INT8, INT8)?

Johere · 2024-11-05T08:47:57Z

Maybe the picture showing an answer regarding a cat was captured using llava-multimodal-chatbot-optimum.ipynb instead of using llava-multimodal-chatbot-genai.ipynb. Do you get the same results when using different accelerators (CPU, GPU, NPU, AUTO, MULTI)? Different results when using the original model or quantized versions (INT8, INT8)?

Yes, all model variants give similar results, I've tried INT4 / INT8 / FP16 on GPU, and INT4 on CPU, which I think might not be reasonable enough...

Johere closed this as completed Oct 29, 2024

Johere reopened this Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llava-multimodal-chatbot-genai run failed #2484

llava-multimodal-chatbot-genai run failed #2484

Johere commented Oct 29, 2024 •

edited

Loading

brmarkus commented Oct 29, 2024 •

edited

Loading

Johere commented Oct 29, 2024

brmarkus commented Oct 29, 2024

Johere commented Oct 29, 2024

brmarkus commented Oct 29, 2024

Johere commented Oct 29, 2024

Johere commented Nov 4, 2024

brmarkus commented Nov 4, 2024

Johere commented Nov 5, 2024

brmarkus commented Nov 5, 2024

Johere commented Nov 5, 2024

llava-multimodal-chatbot-genai run failed #2484

llava-multimodal-chatbot-genai run failed #2484

Comments

Johere commented Oct 29, 2024 • edited Loading

brmarkus commented Oct 29, 2024 • edited Loading

Johere commented Oct 29, 2024

brmarkus commented Oct 29, 2024

Johere commented Oct 29, 2024

brmarkus commented Oct 29, 2024

Johere commented Oct 29, 2024

Johere commented Nov 4, 2024

brmarkus commented Nov 4, 2024

Johere commented Nov 5, 2024

brmarkus commented Nov 5, 2024

Johere commented Nov 5, 2024

Johere commented Oct 29, 2024 •

edited

Loading

brmarkus commented Oct 29, 2024 •

edited

Loading