-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama 70b model fusion and shardding #18175
llama 70b model fusion and shardding #18175
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lintrunner found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.
onnxruntime/python/tools/transformers/models/llama/benchmark.py
Outdated
Show resolved
Hide resolved
Lint/python format pipeline failed. Please run |
onnxruntime/python/tools/transformers/fusion_rotary_attention.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/fusion_rotary_attention.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/fusion_rotary_attention.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/fusion_rotary_attention.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/fusion_rotary_attention.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/llama/benchmark.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/llama/benchmark.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/llama/benchmark.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py
Dismissed
Show dismissed
Hide dismissed
Since onnxruntime/onnxruntime/python/tools/transformers/models/llama/benchmark_all.py Lines 103 to 108 in 9e8ad39
It would also be useful in the README to show an example command with |
/azp run Windows GPU CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Windows GPU CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Windows GPU TensorRT CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
### Description Support llama-70b model fusion and shardding ### Motivation and Context This change enables shard and export llama-70b model into Onnx as this model is too large for single GPU. This change also fuses llama-70b model with repeat_kv pattern different with llama-7b and llama-13b.
### Description Support llama-70b model fusion and shardding ### Motivation and Context This change enables shard and export llama-70b model into Onnx as this model is too large for single GPU. This change also fuses llama-70b model with repeat_kv pattern different with llama-7b and llama-13b.
Description
Support llama-70b model fusion and shardding
Motivation and Context
This change enables shard and export llama-70b model into Onnx as this model is too large for single GPU.
This change also fuses llama-70b model with repeat_kv pattern different with llama-7b and llama-13b.