You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There has been a change in the OpenVINO GenAI Git repository. Recently, when I try to execute LLM models, I am getting a value for the first token latency, but the second token latency is coming up as N/A. Is there any fix for this?
I've created a virtual environment with Python 3.10.12, clone the openvino.genai repository, and ran the following command: python benchmark.py -m ~/waifook_temp/openvino_notebooks/notebooks/llm-chatbot/llama-2-chat-7b/FP16 -p "how are you" -n 2 -d CPU
I'm able to obtain second token latency. The result is shown as follows:
May I know which operating system are you using on your machine? Do you encounter issue when using prompt text instead of prompt file? Could you please share the prompt file with us to replicate the issue from our end?
There has been a change in the OpenVINO GenAI Git repository. Recently, when I try to execute LLM models, I am getting a value for the first token latency, but the second token latency is coming up as N/A. Is there any fix for this?
STEPS:
Create a Conda environment with Python 3.10.11:
conda create -n openvino_env python=3.10.11
conda activate openvino_env
Clone the
openvino.genai
repository:git clone https://github.com/openvinotoolkit/openvino.genai.git
cd openvino.genai
PATH : https://github.com/openvinotoolkit/openvino.genai/tree/master/tools/llm_bench
Install dependencies from the
requirements.txt
file:pip install -r tools/llm_bench/requirements.txt
Download or place the model (e.g.,
llama-2-7b-chat
) in themodels/
directory.Prepare the prompt file (e.g.,
llama-2-7b-chat_l.jsonl
) and place it in theprompts/
directory.Run the benchmark with the following command:
python benchmark.py -m -d -r <report_csv> -f -p <prompt_text> -n <num_iters>
=
Example command:
python benchmark.py -m models/llama-2-7b-chat/ -pf prompts/llama-2-7b-chat_l.jsonl -n 2 -d GPU -r benchmark_results.csv -f openvino
After running the benchmark , second token latency is coming up as N/A. Is there any fix for this
The text was updated successfully, but these errors were encountered: