Add Llama2 Onnx Model E2E test #19417

mszhanyi · 2024-02-05T12:38:54Z

Description

Leverage hugging face API to verify the converted onnx model.
config.json , tokenization.model , etc are generated by Option 3. The test target is the model converted by this repo (option 1), so I only copied the config files.

Motivation and Context

llama related code hasn't a pipeline to run E2E test like https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/pipeline_stable_diffusion.py

kunal-vaishnavi · 2024-02-05T21:31:51Z

Couple of questions:

Can you store the extra files in the same location that the saved PyTorch model is in before running convert_to_onnx? In other words, can you store the extra files in /meta-llama2 instead? This will put the model and its related files in the same location.

onnxruntime/tools/ci_build/github/azure-pipelines/bigmodels-ci-pipeline.yml

Line 315 in 8245dd6

    
                         python3 -m models.llama.convert_to_onnx -m meta-llama/Llama-2-7b-hf --output llama2-7b-fp16 --precision fp16 --execution_provider cuda --input /meta-llama2 --small_gpu ;\

There is already an E2E test using Hugging Face's Optimum at the end of the E2E notebook. Can we use/modify that instead since it shows batch inference? This will allow us to ensure the notebook is up-to-date as well.

mszhanyi · 2024-02-07T02:48:10Z

Couple of questions:

Can you store the extra files in the same location that the saved PyTorch model is in before running convert_to_onnx? In other words, can you store the extra files in /meta-llama2 instead? This will put the model and its related files in the same location.

onnxruntime/tools/ci_build/github/azure-pipelines/bigmodels-ci-pipeline.yml

Line 315 in 8245dd6

python3 -m models.llama.convert_to_onnx -m meta-llama/Llama-2-7b-hf --output llama2-7b-fp16 --precision fp16 --execution_provider cuda --input /meta-llama2 --small_gpu ;\

There is already an E2E test using Hugging Face's Optimum at the end of the E2E notebook. Can we use/modify that instead since it shows batch inference? This will allow us to ensure the notebook is up-to-date as well.

Thank you, I'll leverage your existing example as possible.
By the way, is there a way to run the E2E inference with onnxruntime only .

Maybe we could hold this PR and your example, like tokerizer generation, would be updated by Gen API.
It'd best to cover out own code rather than Optimum in CI. @snnn

import onnxruntime_genai as og
model=og.Model(f'models/microsoft/phi-2', device_type)
tokenizer = model.CreateTokenizer()
....

https://github.com/microsoft/onnxruntime-genai?tab=readme-ov-file#sample-code-for-phi-2-in-python:~:text=import%20onnxruntime_genai%20as%20og%0A%0Amodel%3Dog.Model(f%27models/microsoft/phi%2D2%27%2C%20device_type)%0A%0Atokenizer%20%3D%20model.CreateTokenizer()

mszhanyi added 13 commits February 5, 2024 11:54

add E2E test files

2abd860

run inference

779d9fc

lint

869abc7

minor update

c2835b6

try

e22c6f6

add comparision

568e164

install diff utils

d8e4f2f

typo

9e6e8a4

add missing file

81e45c8

update

3260516

typo

35c0e43

update

1d4be85

dir name

8245dd6

mszhanyi requested a review from a team as a code owner February 5, 2024 12:38

mszhanyi requested a review from kunal-vaishnavi February 5, 2024 12:48

mszhanyi closed this Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Llama2 Onnx Model E2E test #19417

Add Llama2 Onnx Model E2E test #19417

mszhanyi commented Feb 5, 2024 •

edited

Loading

kunal-vaishnavi commented Feb 5, 2024

mszhanyi commented Feb 7, 2024 •

edited

Loading

Add Llama2 Onnx Model E2E test #19417

Add Llama2 Onnx Model E2E test #19417

Conversation

mszhanyi commented Feb 5, 2024 • edited Loading

Description

Motivation and Context

kunal-vaishnavi commented Feb 5, 2024

mszhanyi commented Feb 7, 2024 • edited Loading

mszhanyi commented Feb 5, 2024 •

edited

Loading

mszhanyi commented Feb 7, 2024 •

edited

Loading