Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llava-multimodal-chatbot-genai run failed #2484

Open
Johere opened this issue Oct 29, 2024 · 11 comments
Open

llava-multimodal-chatbot-genai run failed #2484

Johere opened this issue Oct 29, 2024 · 11 comments

Comments

@Johere
Copy link

Johere commented Oct 29, 2024

Running Jupyter notebook of llava model:
https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llava-multimodal-chatbot/llava-multimodal-chatbot-genai.ipynb

  • Device: Arc 770 dGPU
  • Precision: INT4
  • Model: llava-hf/llava-1.5-7b-hf

Describe the bug

RuntimeError: Exception from src/inference/src/cpp/core.cpp:90:
Check 'util::directory_exists(path) || util::file_exists(path)' failed at src/frontends/common/src/frontend.cpp:113:
FrontEnd API failed with GeneralFailure:
ir: Could not open the file: "llava-1.5-7b-hf/INT4/openvino_tokenizer.xml"

Expected behavior
No code changed, expect to work well.

Screenshots
image

@brmarkus
Copy link

brmarkus commented Oct 29, 2024

Which version of the OpenVINO-Notebooks do you use? There were changes in the last days:

https://github.com/openvinotoolkit/openvino_notebooks/commits/latest/notebooks/llava-multimodal-chatbot/llava-multimodal-chatbot-genai.ipynb

Can you temporarily remove all the -q (running quiet option) from all commands in the first cell, run the first cell and check if it was executed successfully?

UPDATE: I needed to delete my existing virt-env folder, create a new virtenv, resync the Notebooks repo, re-install the requirements.txt and then run the Jupyter-Notebook again.
However, the command optimum-cli export openvino --model llava-hf/llava-1.5-7b-hf llava-1.5-7b-hf\FP16 --weight-format fp16 is still running since more than 15 minutes...

@Johere
Copy link
Author

Johere commented Oct 29, 2024

I'm using the OpenVINO-notebook with commit-di: d5c6df43edebf273ccef512439aba11910aad633, branch: latest.
Yes, all the commands in the first cell were executed successfully.

I've pulled the latest changes on branch: latest, the issue still exists. Is this file(llava-1.5-7b-hf/INT4/openvino_tokenizer.xml) required to be existed? The files under INT4 folder is like:

INT4
├── added_tokens.json
├── chat_template.json
├── config.json
├── generation_config.json
├── openvino_language_model.bin
├── openvino_language_model.xml
├── openvino_text_embeddings_model.bin
├── openvino_text_embeddings_model.xml
├── openvino_vision_embeddings_model.bin
├── openvino_vision_embeddings_model.xml
├── preprocessor_config.json
├── processor_config.json
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── tokenizer.model

Thanks for your help!

@brmarkus
Copy link

Have a look into the originally downloaded FP16 folder llava-1.5-7b-hf/FP16. The pairs of XML&BIN files will be converted&compressed.
This could take VERY VERY long and could require LOTS OF LOTS of RAM. Are you sure the conversion step has already finished, and finished successfully?
It's still running on my machine since more than 30 minutes - and almost all of my 64GB RAM memory is used during conversion... still running...

@Johere
Copy link
Author

Johere commented Oct 29, 2024

Have a look into the originally downloaded FP16 folder llava-1.5-7b-hf/FP16. The pairs of XML&BIN files will be converted&compressed. This could take VERY VERY long and could require LOTS OF LOTS of RAM. Are you sure the conversion step has already finished, and finished successfully? It's still running on my machine since more than 30 minutes - and almost all of my 64GB RAM memory is used during conversion... still running...

Yes, It works fine in my environment. I think the size of model files are expected:

$ du -sh llava-1.5-7b-hf/*
14G     llava-1.5-7b-hf/FP16
4.1G    llava-1.5-7b-hf/INT4

@brmarkus
Copy link

Conversion and compression now has finished on my machine.
My INT4 folder looks like this:
image

=> yes, the file openvino_tokenizer.xml and openvino_tokenizer.bin should exist...
the files openvino_detokenizer.bin and openvino_detokenizer.xml are also missing on your side.

Are the files present in the original FP16 folder?

Start the commands again and watch CPU- and RAM-usage... it will take VERY long and will use LOTS of RAM and CPU-usage.

@Johere
Copy link
Author

Johere commented Oct 29, 2024

Ohh I don't have file openvino_tokenizer.xml and openvino_tokenizer.bin in FP16 folder. Removed this folder and regenerated using optimum-cli, now I can see these files.

It works well now, thank you so much for the help!

@Johere Johere closed this as completed Oct 29, 2024
@Johere Johere reopened this Nov 4, 2024
@Johere
Copy link
Author

Johere commented Nov 4, 2024

Hi @brmarkus I reopened this issue because I think I'm getting the wrong answer using this example: llava-multimodal-chatbot-genai.ipynb.

Using the branch latest, commit-id: dab21db

Screenshot attached here:
image

Thanks!

@brmarkus
Copy link

brmarkus commented Nov 4, 2024

Hmm, download, conversion and compression is done here:

from cmd_helper import optimum_cli

model_id = "llava-hf/llava-1.5-7b-hf"
model_path = Path(model_id.split("/")[-1]) / "FP16"

if not model_path.exists():
optimum_cli(model_id, model_path, additional_args={"weight-format": "fp16"})

There is no version information given - so it could happen that in Huggingface a newer/different/updated/modified model gets updated/upgraded... and then the same query could result in a different response (besides SW-/HW-/platform-specific differences like rounding effects, HW-driver differences could result in different optimizations).

The response, however, sounds "reasonable"... no "hallucination"...

@Johere
Copy link
Author

Johere commented Nov 5, 2024

Hmm, download, conversion and compression is done here:

from cmd_helper import optimum_cli
model_id = "llava-hf/llava-1.5-7b-hf"
model_path = Path(model_id.split("/")[-1]) / "FP16"
if not model_path.exists():
optimum_cli(model_id, model_path, additional_args={"weight-format": "fp16"})

There is no version information given - so it could happen that in Huggingface a newer/different/updated/modified model gets updated/upgraded... and then the same query could result in a different response (besides SW-/HW-/platform-specific differences like rounding effects, HW-driver differences could result in different optimizations).

The response, however, sounds "reasonable"... no "hallucination"...

But if I run with llava-multimodal-chatbot-optimum.ipynb, I can get the reasonable answer:
image
Do you get the same answer as mine for llava-multimodal-chatbot-genai.ipynb?

@brmarkus
Copy link

brmarkus commented Nov 5, 2024

Maybe the picture showing an answer regarding a cat was captured using llava-multimodal-chatbot-optimum.ipynb instead of using llava-multimodal-chatbot-genai.ipynb.
Do you get the same results when using different accelerators (CPU, GPU, NPU, AUTO, MULTI)?
Different results when using the original model or quantized versions (INT8, INT8)?

@Johere
Copy link
Author

Johere commented Nov 5, 2024

Maybe the picture showing an answer regarding a cat was captured using llava-multimodal-chatbot-optimum.ipynb instead of using llava-multimodal-chatbot-genai.ipynb. Do you get the same results when using different accelerators (CPU, GPU, NPU, AUTO, MULTI)? Different results when using the original model or quantized versions (INT8, INT8)?

Yes, all model variants give similar results, I've tried INT4 / INT8 / FP16 on GPU, and INT4 on CPU, which I think might not be reasonable enough...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants