Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Training] [C# API] TrainingSession EvalStep() doesn't work #18816

Closed
peratrepic opened this issue Dec 14, 2023 · 3 comments
Closed

[Training] [C# API] TrainingSession EvalStep() doesn't work #18816

peratrepic opened this issue Dec 14, 2023 · 3 comments
Labels
api:CSharp issues related to the C# API documentation improvements or additions to documentation; typically submitted using template training issues related to ONNX Runtime training; typically submitted using template

Comments

@peratrepic
Copy link

Describe the issue

I am trying to figure out how to use the TrainingSession.EvalStep() function and it seems impossible. It accepts list of input values and list of output values, instead of just returning output values like the C++ function does. So in order to call it, you have to allocate yourself the list of output values, but there is zero documentation around about how to create it, with what dimensions and what are the expected output values.

I have tried this:

Tensor<float> evalOutputsTensor = new DenseTensor<float>(new[] { 1 });
var evalOutputs = new List<FixedBufferOnnxValue> {
    FixedBufferOnnxValue.CreateFromTensor(evalOutputsTensor)
};
trainingSession.EvalStep(inputs, evalOutputs);

and I am getting this error, no matter what dimensions I use to create the output tensor with:

Microsoft.ML.OnnxRuntime.OnnxRuntimeException: '[ErrorCode:RuntimeException]
C:\a\_work\1\s\orttraining\orttraining\training_api\module.cc:553
onnxruntime::training::api::Module::EvalStep
[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running SoftmaxCrossEntropyLoss node. Name:'onnx::SoftmaxCrossEntropyLoss::4'
Status Message: C:\a\_work\1\s\onnxruntime\core\framework\execution_frame.cc:173
onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{1} Requested shape:{}

Notice the "Current shape:{1} Requested shape:{}" part in the error message. I have tried all kind of shapes for the output tensor, 0, 1, (1,1), nothing works because it somehow expects the empty shape but I don't know how to create such a tensor.

Side note, you guys should really document this API better, there is zero clues about what should EvalStep return. Should it return tensor of one value, the accuracy, should it return accuracy and probabilities for each output class, what should it return? These kind of things are very important for the users, and your documentation is so generic and bland, it just says:

<param name="outputValues">Specify a collection of <see cref="FixedBufferOnnxValue"/> that indicates the output values of the eval model.</param>

Not to mention the C# API docs webpage, it's just a skeleton with zero info.

Sorry for the rant but it's kinda deserved :) I am wasting so much time just trying to figure out how to setup parameters in this API.

To reproduce

I am attaching the simple MNIST model converted from PyTorch into ONNX that I am using for testing of training on device: training_artifacts.zip

I am using this code to run EvalStep:

using (TrainingSession trainingSession = new (checkpointState, TRAINING_MODEL_PATH, EVAL_MODEL_PATH, OPTIMIZER_MODEL_PATH))
{
    Tensor<float> batchFeatures = new DenseTensor<float>(new[] { BATCH_SIZE, IMG_SIZE, IMG_SIZE });
    Tensor<long> batchLabels = new DenseTensor<long>(new[] { BATCH_SIZE });

    var inputs = new List<FixedBufferOnnxValue> {
        FixedBufferOnnxValue.CreateFromTensor(batchFeatures),
        FixedBufferOnnxValue.CreateFromTensor(batchLabels)
    };

    Tensor<float> evalOutputsTensor = new DenseTensor<float>(new[] { 1 });
    var evalOutputs = new List<FixedBufferOnnxValue> {
        FixedBufferOnnxValue.CreateFromTensor(evalOutputsTensor)
    };
    trainingSession.EvalStep(inputs, evalOutputs);
}

Urgency

No response

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.3

PyTorch Version

2.01

Execution Provider

Default CPU

Execution Provider Library Version

No response

@peratrepic peratrepic added the training issues related to ONNX Runtime training; typically submitted using template label Dec 14, 2023
@yuslepukhin yuslepukhin added documentation improvements or additions to documentation; typically submitted using template api:CSharp issues related to the C# API labels Dec 14, 2023
@askhade
Copy link
Contributor

askhade commented Dec 14, 2023

Thanks for reporting the issue and documentation gaps. I will update once I have a resolution.

@baijumeswani
Copy link
Contributor

@peratrepic Sorry for the bad experience with the C# API.

I tried to address the EvalStep issue you reported in #19048.

@baijumeswani
Copy link
Contributor

The code is now merged into main. Please try using the new eval function as described in #19048

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:CSharp issues related to the C# API documentation improvements or additions to documentation; typically submitted using template training issues related to ONNX Runtime training; typically submitted using template
Projects
None yet
Development

No branches or pull requests

4 participants