-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabled Dynamo exporter #21713
Enabled Dynamo exporter #21713
Conversation
onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/llama/llama_inputs.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/llama/llama_parity.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/llama/llama_torch.py
Outdated
Show resolved
Hide resolved
033f6ee
to
0f0ef37
Compare
onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/llama/llama_inputs.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/llama/llama_parity.py
Outdated
Show resolved
Hide resolved
ef421df
to
174888c
Compare
onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py
Outdated
Show resolved
Hide resolved
70f06de
to
f1472d8
Compare
/azp run Big Models, Linux Android Emulator QNN CI Pipeline, Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline |
Commenter does not have sufficient privileges for PR 21713 in repo microsoft/onnxruntime |
/azp run Big Models, Linux Android Emulator QNN CI Pipeline, Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline |
Azure Pipelines successfully started running 10 pipeline(s). |
/azp run Windows ARM64 QNN CI Pipeline, Windows CPU CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows x64 QNN CI Pipeline |
Azure Pipelines successfully started running 7 pipeline(s). |
/azp run onnxruntime-binary-size-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 4 pipeline(s). |
Description
This PR modifies the run_dynamo_export function to ensure it mirrors the behavior of run_torchscript_merged_export rather than run_torchscript_separate_export. Additionally, I made adjustments to the main function to ensure that run_dynamo is correctly invoked.
Motivation and Context
The main motivation for this change is to enable successful export of LLaMA-2 and LLaMA-3 models using the Dynamo exporter to ONNX. Previously, the exporter was saving two copies of the weights, which is inefficient. The modified approach ensures that only one copy of the weights is saved, and the model can support both scenarios. These changes enhance the compatibility of the exporter with LLaMA models and subsequently other models and optimize the export process