-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Training] This training fails for Gemm node #18344
Comments
Is it possible to run the following code: from onnxruntime import InferenceSession
sess = InferenceSession("Walker2.onnx", providers=["CPUExecutionProvider"]) If that fails, the issue probably comes from your model. If that succeeds, the issue probably comes from |
Yes InferenceSession works perfectly:
This works as expected. BTW Here is the relevant part of the model: BTW I tried replacing RandomNormal node with identity and still got the same problem. The model was created with a reinforcement learning algorithm. |
I tried simplifying the model using onnx-modifier to just: But still getting the error: Here is the onnx: https://www.dropbox.com/scl/fi/w9t0yeihnlul1ep1ufly8/Walker4.onnx?rlkey=ktcsl4qi7eqoueefxhl3649jb&dl=0 |
Thanks for the additional information. I'll have a look tomorrow. |
Thankyou! Greatly appreciated. The Opset is v6. I tried to convert it to opset v7 with onnx.version_converter but now get the error: Perhaps its just a very old onnx? (When I create a Linear layer in torch it works fine. So strange this example doesn't work). |
Just resolved my error by updating opset from 10 to 17:
|
Yes thought that was the problem. I think the onnx I was using was just too old to work. Got to try and recreate it from the python. Be nice if it worked though with op set 10 |
Closing this issue and marking it as resolved. Please reopen if there are other questions. |
Describe the issue
Trying to create a training onnx from this:
https://www.dropbox.com/scl/fi/uveuro68v8d2epb2awijt/Walker2.onnx?rlkey=xx8oxntdo5dh8c3lhyby3bx9q&dl=0
Using python:
Background this is an onnx created by Unity ML-agents and I'm attempting to turn it into a train-on-device example. I deleted some extra nodes that weren't connected to the main tree.
Any idea what's causing this error?
RuntimeError: C:\a\_work\1\s\orttraining\orttraining\python\orttraining_pybind_state.cc:841 onnxruntime::python::addObjectMethodsForTraining::<lambda_ac677f721119089b105e2d6a6620788a>::operator () [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("Gemm_13_Grad/Gemm_1", Gemm, "", -1) : ("37_grad": tensor(float),"36": tensor(float),) -> ("action_model._continuous_distribution.mu.weight_grad": tensor(float),) , Error Node (Gemm_13_Grad/Gemm_1) has input size 2 not in range [min=3, max=3].
To reproduce
as above
Urgency
No response
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.15.1
PyTorch Version
1.13.1
Execution Provider
CUDA
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: