You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to get ONNX training graph for Lllama2_7b model. I can get the forward graph, no problem. But the issue occurs when I use generate artifacts.
I don't get this error when I run a signle layer Transformer block (attention+MLP) with similar dimensions. What is causing the issue here?
Additionally, the loss function is throwing an error too : expected 2 but got 66 arguments. Please explain. Thank you!
Hi, do you have the full script for exporting & generating artifacts? Or could you provide the forward graph ONNX file?
Generally, we see these errors when the generated forward graph is incorrect. Especially for the loss function, which expects a certain number of graph outputs. For LLM's especially, we see that unless the base Torch model that is being exported is in training mode, then usually it uses a key-value cache (an inference-only optimization that adds inputs and outputs to the graph).
The Torch model passed to torch.onnx.export (the base_model) must be in training mode (ie, you should be able to train with it), and the input and output names passed to the export function should correlate with the input names and output names of the Torch model.
If you have a working PyTorch training script for Llama2_7b, you can use that to determine the correct input names and output names, and what inputs you need to pass in for it to be in training mode.
Hi @carzh. Thanks a lot for the comment. Yes, I exactly did that and was able to resolve the issue. There was a mismatch with the input dimensions which was generating the error.
Describe the issue
I am trying to get ONNX training graph for Lllama2_7b model. I can get the forward graph, no problem. But the issue occurs when I use
generate artifacts
.I don't get this error when I run a signle layer Transformer block (attention+MLP) with similar dimensions. What is causing the issue here?
Additionally, the loss function is throwing an error too : expected 2 but got 66 arguments. Please explain. Thank you!
To reproduce
Urgency
Urgent
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.19.0
PyTorch Version
2.4.0
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.4
The text was updated successfully, but these errors were encountered: