Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Llama2 Onnx Model E2E test #19417

Closed
wants to merge 13 commits into from
Closed

Add Llama2 Onnx Model E2E test #19417

wants to merge 13 commits into from

Conversation

mszhanyi
Copy link
Contributor

@mszhanyi mszhanyi commented Feb 5, 2024

Description

  1. Leverage hugging face API to verify the converted onnx model.
  2. config.json , tokenization.model , etc are generated by Option 3. The test target is the model converted by this repo (option 1), so I only copied the config files.

Motivation and Context

llama related code hasn't a pipeline to run E2E test like https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/pipeline_stable_diffusion.py

@mszhanyi mszhanyi requested a review from a team as a code owner February 5, 2024 12:38
@kunal-vaishnavi
Copy link
Contributor

Couple of questions:

  • Can you store the extra files in the same location that the saved PyTorch model is in before running convert_to_onnx? In other words, can you store the extra files in /meta-llama2 instead? This will put the model and its related files in the same location.

python3 -m models.llama.convert_to_onnx -m meta-llama/Llama-2-7b-hf --output llama2-7b-fp16 --precision fp16 --execution_provider cuda --input /meta-llama2 --small_gpu ;\

  • There is already an E2E test using Hugging Face's Optimum at the end of the E2E notebook. Can we use/modify that instead since it shows batch inference? This will allow us to ensure the notebook is up-to-date as well.

@mszhanyi
Copy link
Contributor Author

mszhanyi commented Feb 7, 2024

Couple of questions:

  • Can you store the extra files in the same location that the saved PyTorch model is in before running convert_to_onnx? In other words, can you store the extra files in /meta-llama2 instead? This will put the model and its related files in the same location.

python3 -m models.llama.convert_to_onnx -m meta-llama/Llama-2-7b-hf --output llama2-7b-fp16 --precision fp16 --execution_provider cuda --input /meta-llama2 --small_gpu ;\

  • There is already an E2E test using Hugging Face's Optimum at the end of the E2E notebook. Can we use/modify that instead since it shows batch inference? This will allow us to ensure the notebook is up-to-date as well.

Thank you, I'll leverage your existing example as possible.
By the way, is there a way to run the E2E inference with onnxruntime only .

Maybe we could hold this PR and your example, like tokerizer generation, would be updated by Gen API.
It'd best to cover out own code rather than Optimum in CI. @snnn

import onnxruntime_genai as og
model=og.Model(f'models/microsoft/phi-2', device_type)
tokenizer = model.CreateTokenizer()
....

https://github.com/microsoft/onnxruntime-genai?tab=readme-ov-file#sample-code-for-phi-2-in-python:~:text=import%20onnxruntime_genai%20as%20og%0A%0Amodel%3Dog.Model(f%27models/microsoft/phi%2D2%27%2C%20device_type)%0A%0Atokenizer%20%3D%20model.CreateTokenizer()

@mszhanyi mszhanyi closed this Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants