Fix llama.covert_onnx to make it runnable in CI #19372

mszhanyi · 2024-02-01T15:54:54Z

Description

make parity_check use local model to avoid using hf token
del the model didn't work because it tried to del the object define out of the function scope.
So it caused out of memory in A10.
In fact, 16G GPU memory (one T4) is enough. But the conversion process always be killed in T4 and it works on A10/24G.
Standard_NC4as_T4_v3 has 28G CPU memory
Standard_NV36ads_A10_v5 has 440G memory.
It looks that the model conversion needs very huge memory.

Motivation and Context

Last time, I came across some issues in convert_to_onnx.py so I use the onnx model in https://github.com/microsoft/Llama-2-Onnx for testing.
Now, these issues could be fixed. So I use onnx model generated by this repo and the CI can cover the model conversion.

onnxruntime/python/tools/transformers/models/llama/llama_parity.py

kunal-vaishnavi · 2024-02-01T19:15:55Z

Now that torch v2.2.0 has been released in stable, can you also update the below lines to say torch>=2.2.0?

onnxruntime/onnxruntime/python/tools/transformers/models/llama/requirements.txt

Line 3 in eb0ce86

torch>=2.2.0.dev20230920

onnxruntime/onnxruntime/python/tools/transformers/models/llama/requirements-cuda.txt

Line 2 in eb0ce86

    
           # Please manually install torch>=2.2.0.dev20230920 with CUDA enabled for the CUDA version installed in your system.

onnxruntime/python/tools/transformers/models/llama/llama_parity.py

mszhanyi added 6 commits February 1, 2024 13:49

fix bugs in parirty

d8bbd62

update big models pipeline

bd0a3a4

typo

9d9bf36

24G

57f11f4

rm old test case

9f68837

update

67661b9

mszhanyi requested a review from a team as a code owner February 1, 2024 15:54

mszhanyi requested review from kunal-vaishnavi and frank-dong-ms February 1, 2024 15:56

github-advanced-security bot found potential problems Feb 1, 2024

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/llama_parity.py Fixed Show fixed Hide fixed

kunal-vaishnavi reviewed Feb 1, 2024

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/llama_parity.py Outdated Show resolved Hide resolved

mszhanyi added 2 commits February 2, 2024 07:35

update

444386a

improve loading memory

e78372e

github-advanced-security bot found potential problems Feb 2, 2024

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/llama_parity.py Fixed Show fixed Hide fixed

onnxruntime/python/tools/transformers/models/llama/llama_parity.py Fixed Show fixed Hide fixed

mszhanyi added 2 commits February 2, 2024 11:02

lint

894b0be

update

b59e690

kunal-vaishnavi reviewed Feb 2, 2024

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/llama_parity.py Outdated Show resolved Hide resolved

kunal-vaishnavi reviewed Feb 2, 2024

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/llama_parity.py Outdated Show resolved Hide resolved

kunal-vaishnavi reviewed Feb 2, 2024

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/llama_parity.py Show resolved Hide resolved