[Training] [C# API] TrainingSession EvalStep() doesn't work #18816

peratrepic · 2023-12-14T02:47:48Z

Describe the issue

I am trying to figure out how to use the TrainingSession.EvalStep() function and it seems impossible. It accepts list of input values and list of output values, instead of just returning output values like the C++ function does. So in order to call it, you have to allocate yourself the list of output values, but there is zero documentation around about how to create it, with what dimensions and what are the expected output values.

I have tried this:

Tensor<float> evalOutputsTensor = new DenseTensor<float>(new[] { 1 });
var evalOutputs = new List<FixedBufferOnnxValue> {
    FixedBufferOnnxValue.CreateFromTensor(evalOutputsTensor)
};
trainingSession.EvalStep(inputs, evalOutputs);

and I am getting this error, no matter what dimensions I use to create the output tensor with:

Microsoft.ML.OnnxRuntime.OnnxRuntimeException: '[ErrorCode:RuntimeException]
C:\a\_work\1\s\orttraining\orttraining\training_api\module.cc:553
onnxruntime::training::api::Module::EvalStep
[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running SoftmaxCrossEntropyLoss node. Name:'onnx::SoftmaxCrossEntropyLoss::4'
Status Message: C:\a\_work\1\s\onnxruntime\core\framework\execution_frame.cc:173
onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{1} Requested shape:{}

Notice the "Current shape:{1} Requested shape:{}" part in the error message. I have tried all kind of shapes for the output tensor, 0, 1, (1,1), nothing works because it somehow expects the empty shape but I don't know how to create such a tensor.

Side note, you guys should really document this API better, there is zero clues about what should EvalStep return. Should it return tensor of one value, the accuracy, should it return accuracy and probabilities for each output class, what should it return? These kind of things are very important for the users, and your documentation is so generic and bland, it just says:

<param name="outputValues">Specify a collection of <see cref="FixedBufferOnnxValue"/> that indicates the output values of the eval model.</param>

Not to mention the C# API docs webpage, it's just a skeleton with zero info.

Sorry for the rant but it's kinda deserved :) I am wasting so much time just trying to figure out how to setup parameters in this API.

To reproduce

I am attaching the simple MNIST model converted from PyTorch into ONNX that I am using for testing of training on device: training_artifacts.zip

I am using this code to run EvalStep:

using (TrainingSession trainingSession = new (checkpointState, TRAINING_MODEL_PATH, EVAL_MODEL_PATH, OPTIMIZER_MODEL_PATH))
{
    Tensor<float> batchFeatures = new DenseTensor<float>(new[] { BATCH_SIZE, IMG_SIZE, IMG_SIZE });
    Tensor<long> batchLabels = new DenseTensor<long>(new[] { BATCH_SIZE });

    var inputs = new List<FixedBufferOnnxValue> {
        FixedBufferOnnxValue.CreateFromTensor(batchFeatures),
        FixedBufferOnnxValue.CreateFromTensor(batchLabels)
    };

    Tensor<float> evalOutputsTensor = new DenseTensor<float>(new[] { 1 });
    var evalOutputs = new List<FixedBufferOnnxValue> {
        FixedBufferOnnxValue.CreateFromTensor(evalOutputsTensor)
    };
    trainingSession.EvalStep(inputs, evalOutputs);
}

Urgency

No response

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.3

PyTorch Version

2.01

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

askhade · 2023-12-14T22:00:55Z

Thanks for reporting the issue and documentation gaps. I will update once I have a resolution.

baijumeswani · 2024-01-08T19:53:22Z

@peratrepic Sorry for the bad experience with the C# API.

I tried to address the EvalStep issue you reported in #19048.

baijumeswani · 2024-01-26T18:05:02Z

The code is now merged into main. Please try using the new eval function as described in #19048

peratrepic added the training issues related to ONNX Runtime training; typically submitted using template label Dec 14, 2023

yuslepukhin added documentation improvements or additions to documentation; typically submitted using template api:CSharp issues related to the C# API labels Dec 14, 2023

baijumeswani mentioned this issue Jan 8, 2024

Add support for a collection of OrtValue as inputs and outputs to C# TrainingSession #19048

Merged

baijumeswani closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Training] [C# API] TrainingSession EvalStep() doesn't work #18816

[Training] [C# API] TrainingSession EvalStep() doesn't work #18816

peratrepic commented Dec 14, 2023

askhade commented Dec 14, 2023

baijumeswani commented Jan 8, 2024

baijumeswani commented Jan 26, 2024

[Training] [C# API] TrainingSession EvalStep() doesn't work #18816

[Training] [C# API] TrainingSession EvalStep() doesn't work #18816

Comments

peratrepic commented Dec 14, 2023

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

PyTorch Version

Execution Provider

Execution Provider Library Version

askhade commented Dec 14, 2023

baijumeswani commented Jan 8, 2024

baijumeswani commented Jan 26, 2024