You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to retrain a YoloV8n model on a custom dataset retrieved directly on a device arm64 running on Linux. I'm using onnxruntime to generate the artifacts and for now I'm struggling a little bit to define a loss function for my model. I have the pytorch model generated from ultralytics.
I tried following the suggestion made by @baijumeswani on a similar issue.
class MyPTModelWithLoss:
def __init__(self):
...
def forward(self, ...):
p, q, r = compute_logits()
loss = loss1(p) + loss2(q) + loss3(r)
return loss
pt_model = MyPTModelWithLoss(...)
torch.onnx.export(pt_model, ...)
onnx_model = onnx.load(<exported_onnx_model_path>)
artifacts.generate_artifacts(onnx_model, requires_grad=[...], frozen_params=[...], loss=None, optimizer=...)
This approach suggests to add the loss function into the end of forward pass of the model and feed None to the loss while generating the artifacts. The problem with that approach is that the gradient builder tries to build gradient for operations used by the loss such as ReduceMin, ReduceMax ... However there is no gradient definition for these operation and it is not the correct practice to compute gradient for the loss.
I was wondering if there is a way to cut the graph into two subgraphs to build the gradient only for the forward pass and not the loss function too ? If not, what would be the best approach to generate the training artifacts in this case ?
Thank you for you support,
To reproduce
class MyPTModelWithLoss:
def __init__(self):
...
def forward(self, ...):
p, q, r = compute_logits()
loss = loss1(p) + loss2(q) + loss3(r)
return loss
pt_model = MyPTModelWithLoss(...)
torch.onnx.export(pt_model, ...)
onnx_model = onnx.load(<exported_onnx_model_path>)
artifacts.generate_artifacts(onnx_model, requires_grad=[...], frozen_params=[...], loss=None, optimizer=...)
Urgency
This is really urgent, we are trying to deploy a retrainable yolov8 model on the device using onnxruntime-training framework.
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.17.1
PyTorch Version
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered:
OAHLSTM
added
the
training
issues related to ONNX Runtime training; typically submitted using template
label
Apr 4, 2024
However there is no gradient definition for these operation and it is not the correct practice to compute gradient for the loss.
Gradient computation should always start at the loss. What is being computed during backpropagation is the gradient of the loss with respect to the inputs at each node.
The goal of the training phase is to minimize the loss. So, we want to find the changes that need to be made to the weight parameters such that the loss is minimized. During backpropagation, we start with 1 (as the gradient of the loss w.r.t to itself). And as we encounter each node in the forward graph (in a backwards order), we want to compute the gradient of the loss with respect to the inputs to that node.
The problem right now is that, for the loss defined in your model, we don't have the necessary gradient operator kernels (i.e. ReduceMinGrad and ReduceMaxGrad). It might take some time for us to get to this work. Would you like to contribute and write the CPU kernels for these operators?
Describe the issue
Hello,
I'm trying to retrain a YoloV8n model on a custom dataset retrieved directly on a device arm64 running on Linux. I'm using onnxruntime to generate the artifacts and for now I'm struggling a little bit to define a loss function for my model. I have the pytorch model generated from ultralytics.
I tried following the suggestion made by @baijumeswani on a similar issue.
This approach suggests to add the loss function into the end of forward pass of the model and feed None to the loss while generating the artifacts. The problem with that approach is that the gradient builder tries to build gradient for operations used by the loss such as ReduceMin, ReduceMax ... However there is no gradient definition for these operation and it is not the correct practice to compute gradient for the loss.
I was wondering if there is a way to cut the graph into two subgraphs to build the gradient only for the forward pass and not the loss function too ? If not, what would be the best approach to generate the training artifacts in this case ?
Thank you for you support,
To reproduce
Urgency
This is really urgent, we are trying to deploy a retrainable yolov8 model on the device using onnxruntime-training framework.
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.17.1
PyTorch Version
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: