Unmanage memory growth when calling Onnx from C# with C++ API #22992

arvindsouza · 2024-12-03T13:30:47Z

Describe the issue

Hello, I have an application written in dotnet that is calling a managed C++ class which in turn wraps Native C++ code containing the Onnx calls

On Startup the C# class is calling the Load Model function like so:

mImageInferencer.LoadModel(AppDomain.CurrentDomain.BaseDirectory + ModelPath, enableGpu, counter);

The Inferencer classes may have multiple instances created within Tasks in .Net

This initializes the ONNX Session like so:

void ImageInferencer::LoadModel(std::string modelPath, bool enableGpu, int counter) {

	try {

		wchar_t* wideStringModelPath = new wchar_t[modelPath.length() + 1];
		std::copy(modelPath.begin(), modelPath.end(), wideStringModelPath);
		wideStringModelPath[modelPath.length()] = 0;
		ID = counter;

		std::string instanceName{ "image-classification-inference" + to_string(counter) };

		onnxSessionEnv = new Env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING,
			instanceName.c_str());

		SessionOptions sessionOptions;
		sessionOptions.SetIntraOpNumThreads(4);

		sessionOptions.SetExecutionMode(ORT_PARALLEL);
		sessionOptions.SetInterOpNumThreads(4);
		sessionOptions.SetGraphOptimizationLevel(
			GraphOptimizationLevel::ORT_ENABLE_BASIC);
	
		if (enableGpu)
		{

			OrtCUDAProviderOptions cudaOptions;
			cudaOptions.arena_extend_strategy = 1;

			cudaOptions.device_id = 0;
			//cudaOptions.gpu_mem_limit = 2 * 1024 * 1024 * 1024;
			cudaOptions.cudnn_conv_algo_search = OrtCudnnConvAlgoSearchDefault;
			//cudaOptions.do_copy_in_default_stream = 0;
			sessionOptions.AppendExecutionProvider_CUDA(cudaOptions);


		}
		mOrtSession = new Session(*onnxSessionEnv, wideStringModelPath, sessionOptions);

		delete[modelPath.length() + 1] wideStringModelPath;
		wideStringModelPath = NULL;

		sessionOptions.release();
		delete sessionOptions;

	}
	catch (exception& e) {
		PLOGD << e.what();
	}
}

Both the mOrtSession and onnxSessionEnv are class level variables

The .Net class then listens for new images passed to a message queue and, upon receiving an image, passes it to the model for processing like so:

int numDetections = mImageInferencer.ProcessImageV10(frame, frame.Length, (float)options.ConfidenceThreshold, options.EnableTracking, inputKey);

This is a process that runs infinitely so long as we receive frames from the source (cameras)

This process then calls the Session Run on the Onnx Session as below:

vector<InferencerDetection> ImageInferencer::ProcessImageV10(uchar* image, int size, float threshold, bool enableTracking, std::string trackerKey) {
	std::vector<InferencerDetection> result;

	try {

		RunOptions runOptions = Ort::RunOptions();
		runOptions.AddConfigEntry("memory.enable_memory_arena_shrinkage", "cpu:0;gpu:0");

		AllocatorWithDefaultOptions ort_alloc;

		MemoryInfo memory_info = Ort::MemoryInfo("Cuda", OrtArenaAllocator, 0, OrtMemTypeDefault);
		MemoryInfo memory_info_cpu = MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
		PLOGD << "allocated memory";

		cv::Mat imageBGR = imdecode(_InputArray(static_cast<uchar*>(image), size), cv::IMREAD_COLOR);
		if (imageBGR.empty()) {
			std::cerr << "Error: Could not decode image" << std::endl;
			return result;
		}

		float* blob = nullptr;

		std::vector<int64_t> inputDims{ 1, 3, -1, -1 };

		this->PreProcessing(imageBGR, blob, inputDims);

		size_t inputTensorSize = this->VectorProduct(inputDims);
		std::vector<float> inputTensorValues(blob, blob + inputTensorSize);

		std::vector<Ort::Value> inputTensors;
		Ort::Allocator gpu_allocator(*mOrtSession, memory_info);
		AllocatorWithDefaultOptions defaultAllocator;

		if (runCounter == 0) {
			input_data = gpu_allocator.Alloc(4 * inputTensorValues.size());
			runCounter++;
		}

		Ort::IoBinding io_binding{ *mOrtSession };


		cudaMemcpy(input_data, inputTensorValues.data(), 4 * inputTensorValues.size(), cudaMemcpyHostToDevice);

		inputTensors.push_back(Value::CreateTensor<float>(gpu_allocator.GetInfo(), reinterpret_cast<float*>(input_data), inputTensorValues.size(), inputDims.data(), inputDims.size()));

		AllocatedStringPtr inputName = mOrtSession->GetInputNameAllocated(0, ort_alloc);
		AllocatedStringPtr outputName = mOrtSession->GetOutputNameAllocated(0, ort_alloc);

		const std::array<const char*, 1> inputNames = { inputName.get() };
		const std::array<const char*, 1> outputNames = { outputName.get() };

		io_binding.BindInput(inputNames[0], inputTensors[0]);

		io_binding.BindOutput(outputNames[0], memory_info_cpu);

		mOrtSession->Run(runOptions, io_binding);
		auto output_tensors = io_binding.GetOutputValues();

		std::vector<float> outputTensorValues(output_tensors[0].GetTensorMutableData<float>(), output_tensors[0].GetTensorMutableData<float>() + output_tensors[0].GetTensorTypeAndShapeInfo().GetElementCount());
		//PLOGD << "output tensor values length" << outputTensorValues.size();

		cv::Size shape = cv::Size(width, height);

		result = PostProcessingV10(shape, imageBGR.size(), outputTensorValues, threshold, 0.6);

		delete blob;
		blob = NULL;
		imageBGR.release();
		gpu_allocator.release();

		inputDims.clear();
		inputTensorValues.clear();
		outputTensorValues.clear();
		inputTensors.clear();
		output_tensors.clear();
		memory_info.release();
		memory_info_cpu.release();
		runOptions.release();
		inputName.release();
		outputName.release();

	}
	catch (exception& ex) {
		PLOGE << ex.what();
	}

	return result;
}

The problem is that we are seeing a constant growth in Unmanaged memory when running this process function, I have tried disabling CPU Arena and Mem Arena to no avail. Even Releasing and re-initializing the session does not help.
I have also run this with the TensorRT as well as CPU and am still seeing the unmanaged memory grow

To reproduce

Running model with the same Onnx session causes seemingly infinite growth in unmanaged memory

Urgency

Have an upcoming production deployment in a week that requires this fixed

Platform

Windows

OS Version

Windows Server 2019

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA, TensorRT

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

yuslepukhin · 2024-12-03T18:50:52Z

Nearly all of the exposed classes are IDisposable.
Unmanaged memory cannot be reclaimed by GC, so those classes must be disposed of properly.

If you use Ort:: native classes those are smart pointers. Ort::Env class will automatically destroy the underlying object. In the code above you are defeating the purpose of Ort::Env class by doing new on it.

In native C++ If you do new and delete then it is a sure sign of a potential memory leak.
In general, any time you have use delete it a sign of a memory leak.

If your managed class wraps native resources, then it has to implement IDisposable and then be disposed properly.
I do not write managed C++ every day, so you would need to find out what is the syntactic equivalent of IDisposable there.

arvindsouza · 2024-12-03T20:03:57Z

The managed class creates a single instance of the native class which in turn contains a single onnx session object. The Session.Run is then called using the same class and session instance for every request

What I have noticed is that every call to Run increases the unmanaged memory, and even clearing the session with .release and re-instantiating it does not free up the memory.

Is this expected behaviour?

I've also tried with creating the Env like so

Env session = Env(parameters)

However, this does not stabilize the memory

I don't think Disposable would help in this case since I want to keep the class instance in memory for the lifetime of the process. It would only be disposed on exiting

arvindsouza · 2024-12-09T15:24:32Z

Just an update on this, I did include a destructor in the wrapper class to destroy the native instance with .Dispose() and placed the call to Session.Run in a using block. However, I'm still seeing the issue.
I would really appreciate some support on this @yuslepukhin Does onnxruntime retain allocated memory even after calling release on the session object and deleting the containing class?

arvindsouza added the performance issues related to performance regressions label Dec 3, 2024

github-actions bot added .NET Pull requests that update .net code api:CSharp issues related to the C# API labels Dec 3, 2024

arvindsouza changed the title ~~Unmanage memory growth when calling Onnx from C#~~ Unmanage memory growth when calling Onnx from C# with C++ API Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unmanage memory growth when calling Onnx from C# with C++ API #22992

Unmanage memory growth when calling Onnx from C# with C++ API #22992

arvindsouza commented Dec 3, 2024 •

edited

Loading

yuslepukhin commented Dec 3, 2024

arvindsouza commented Dec 3, 2024 •

edited

Loading

arvindsouza commented Dec 9, 2024

Unmanage memory growth when calling Onnx from C# with C++ API #22992

Unmanage memory growth when calling Onnx from C# with C++ API #22992

Comments

arvindsouza commented Dec 3, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

yuslepukhin commented Dec 3, 2024

arvindsouza commented Dec 3, 2024 • edited Loading

arvindsouza commented Dec 9, 2024

arvindsouza commented Dec 3, 2024 •

edited

Loading

arvindsouza commented Dec 3, 2024 •

edited

Loading