Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unmanage memory growth when calling Onnx from C# with C++ API #22992

Open
arvindsouza opened this issue Dec 3, 2024 · 3 comments
Open

Unmanage memory growth when calling Onnx from C# with C++ API #22992

arvindsouza opened this issue Dec 3, 2024 · 3 comments
Labels
api:CSharp issues related to the C# API .NET Pull requests that update .net code performance issues related to performance regressions

Comments

@arvindsouza
Copy link

arvindsouza commented Dec 3, 2024

Describe the issue

Hello, I have an application written in dotnet that is calling a managed C++ class which in turn wraps Native C++ code containing the Onnx calls

On Startup the C# class is calling the Load Model function like so:

mImageInferencer.LoadModel(AppDomain.CurrentDomain.BaseDirectory + ModelPath, enableGpu, counter);

The Inferencer classes may have multiple instances created within Tasks in .Net

This initializes the ONNX Session like so:

void ImageInferencer::LoadModel(std::string modelPath, bool enableGpu, int counter) {

	try {

		wchar_t* wideStringModelPath = new wchar_t[modelPath.length() + 1];
		std::copy(modelPath.begin(), modelPath.end(), wideStringModelPath);
		wideStringModelPath[modelPath.length()] = 0;
		ID = counter;

		std::string instanceName{ "image-classification-inference" + to_string(counter) };

		onnxSessionEnv = new Env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING,
			instanceName.c_str());

		SessionOptions sessionOptions;
		sessionOptions.SetIntraOpNumThreads(4);

		sessionOptions.SetExecutionMode(ORT_PARALLEL);
		sessionOptions.SetInterOpNumThreads(4);
		sessionOptions.SetGraphOptimizationLevel(
			GraphOptimizationLevel::ORT_ENABLE_BASIC);
	
		if (enableGpu)
		{

			OrtCUDAProviderOptions cudaOptions;
			cudaOptions.arena_extend_strategy = 1;

			cudaOptions.device_id = 0;
			//cudaOptions.gpu_mem_limit = 2 * 1024 * 1024 * 1024;
			cudaOptions.cudnn_conv_algo_search = OrtCudnnConvAlgoSearchDefault;
			//cudaOptions.do_copy_in_default_stream = 0;
			sessionOptions.AppendExecutionProvider_CUDA(cudaOptions);


		}
		mOrtSession = new Session(*onnxSessionEnv, wideStringModelPath, sessionOptions);

		delete[modelPath.length() + 1] wideStringModelPath;
		wideStringModelPath = NULL;

		sessionOptions.release();
		delete sessionOptions;

	}
	catch (exception& e) {
		PLOGD << e.what();
	}
}

Both the mOrtSession and onnxSessionEnv are class level variables

The .Net class then listens for new images passed to a message queue and, upon receiving an image, passes it to the model for processing like so:

int numDetections = mImageInferencer.ProcessImageV10(frame, frame.Length, (float)options.ConfidenceThreshold, options.EnableTracking, inputKey);

This is a process that runs infinitely so long as we receive frames from the source (cameras)

This process then calls the Session Run on the Onnx Session as below:

vector<InferencerDetection> ImageInferencer::ProcessImageV10(uchar* image, int size, float threshold, bool enableTracking, std::string trackerKey) {
	std::vector<InferencerDetection> result;

	try {

		RunOptions runOptions = Ort::RunOptions();
		runOptions.AddConfigEntry("memory.enable_memory_arena_shrinkage", "cpu:0;gpu:0");

		AllocatorWithDefaultOptions ort_alloc;

		MemoryInfo memory_info = Ort::MemoryInfo("Cuda", OrtArenaAllocator, 0, OrtMemTypeDefault);
		MemoryInfo memory_info_cpu = MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
		PLOGD << "allocated memory";

		cv::Mat imageBGR = imdecode(_InputArray(static_cast<uchar*>(image), size), cv::IMREAD_COLOR);
		if (imageBGR.empty()) {
			std::cerr << "Error: Could not decode image" << std::endl;
			return result;
		}

		float* blob = nullptr;

		std::vector<int64_t> inputDims{ 1, 3, -1, -1 };

		this->PreProcessing(imageBGR, blob, inputDims);

		size_t inputTensorSize = this->VectorProduct(inputDims);
		std::vector<float> inputTensorValues(blob, blob + inputTensorSize);

		std::vector<Ort::Value> inputTensors;
		Ort::Allocator gpu_allocator(*mOrtSession, memory_info);
		AllocatorWithDefaultOptions defaultAllocator;

		if (runCounter == 0) {
			input_data = gpu_allocator.Alloc(4 * inputTensorValues.size());
			runCounter++;
		}

		Ort::IoBinding io_binding{ *mOrtSession };


		cudaMemcpy(input_data, inputTensorValues.data(), 4 * inputTensorValues.size(), cudaMemcpyHostToDevice);

		inputTensors.push_back(Value::CreateTensor<float>(gpu_allocator.GetInfo(), reinterpret_cast<float*>(input_data), inputTensorValues.size(), inputDims.data(), inputDims.size()));

		AllocatedStringPtr inputName = mOrtSession->GetInputNameAllocated(0, ort_alloc);
		AllocatedStringPtr outputName = mOrtSession->GetOutputNameAllocated(0, ort_alloc);

		const std::array<const char*, 1> inputNames = { inputName.get() };
		const std::array<const char*, 1> outputNames = { outputName.get() };

		io_binding.BindInput(inputNames[0], inputTensors[0]);

		io_binding.BindOutput(outputNames[0], memory_info_cpu);

		mOrtSession->Run(runOptions, io_binding);
		auto output_tensors = io_binding.GetOutputValues();

		std::vector<float> outputTensorValues(output_tensors[0].GetTensorMutableData<float>(), output_tensors[0].GetTensorMutableData<float>() + output_tensors[0].GetTensorTypeAndShapeInfo().GetElementCount());
		//PLOGD << "output tensor values length" << outputTensorValues.size();

		cv::Size shape = cv::Size(width, height);

		result = PostProcessingV10(shape, imageBGR.size(), outputTensorValues, threshold, 0.6);

		delete blob;
		blob = NULL;
		imageBGR.release();
		gpu_allocator.release();

		inputDims.clear();
		inputTensorValues.clear();
		outputTensorValues.clear();
		inputTensors.clear();
		output_tensors.clear();
		memory_info.release();
		memory_info_cpu.release();
		runOptions.release();
		inputName.release();
		outputName.release();

	}
	catch (exception& ex) {
		PLOGE << ex.what();
	}

	return result;
}

The problem is that we are seeing a constant growth in Unmanaged memory when running this process function, I have tried disabling CPU Arena and Mem Arena to no avail. Even Releasing and re-initializing the session does not help.
I have also run this with the TensorRT as well as CPU and am still seeing the unmanaged memory grow

To reproduce

Running model with the same Onnx session causes seemingly infinite growth in unmanaged memory

Urgency

Have an upcoming production deployment in a week that requires this fixed

Platform

Windows

OS Version

Windows Server 2019

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA, TensorRT

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

@arvindsouza arvindsouza added the performance issues related to performance regressions label Dec 3, 2024
@github-actions github-actions bot added .NET Pull requests that update .net code api:CSharp issues related to the C# API labels Dec 3, 2024
@arvindsouza arvindsouza changed the title Unmanage memory growth when calling Onnx from C# Unmanage memory growth when calling Onnx from C# with C++ API Dec 3, 2024
@yuslepukhin
Copy link
Member

Nearly all of the exposed classes are IDisposable.
Unmanaged memory cannot be reclaimed by GC, so those classes must be disposed of properly.

If you use Ort:: native classes those are smart pointers. Ort::Env class will automatically destroy the underlying object. In the code above you are defeating the purpose of Ort::Env class by doing new on it.

In native C++ If you do new and delete then it is a sure sign of a potential memory leak.
In general, any time you have use delete it a sign of a memory leak.

If your managed class wraps native resources, then it has to implement IDisposable and then be disposed properly.
I do not write managed C++ every day, so you would need to find out what is the syntactic equivalent of IDisposable there.

@arvindsouza
Copy link
Author

arvindsouza commented Dec 3, 2024

The managed class creates a single instance of the native class which in turn contains a single onnx session object. The Session.Run is then called using the same class and session instance for every request

What I have noticed is that every call to Run increases the unmanaged memory, and even clearing the session with .release and re-instantiating it does not free up the memory.

Is this expected behaviour?

I've also tried with creating the Env like so

Env session = Env(parameters)

However, this does not stabilize the memory

I don't think Disposable would help in this case since I want to keep the class instance in memory for the lifetime of the process. It would only be disposed on exiting

@arvindsouza
Copy link
Author

Just an update on this, I did include a destructor in the wrapper class to destroy the native instance with .Dispose() and placed the call to Session.Run in a using block. However, I'm still seeing the issue.
I would really appreciate some support on this @yuslepukhin Does onnxruntime retain allocated memory even after calling release on the session object and deleting the containing class?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:CSharp issues related to the C# API .NET Pull requests that update .net code performance issues related to performance regressions
Projects
None yet
Development

No branches or pull requests

2 participants