[Performance] Increased memory usage when loading from bytes #21165

ignogueiras · 2024-06-25T15:28:46Z

Describe the issue

Until now we were creating our Ort::Session object by passing it the path of our model (.onnx file).
Now we are trying to create the Session object from the bytes already read in a std::vector. Although everything seems to work correctly, we have detected a higher memory consumption, approximately the size of the model.
We are reasonably sure that the vector is being released correctly, so we have the impression that creating the Session is making a copy that is not being released.
Is this expected? Or are we doing something incorrectly?

To reproduce

We observe a much bigger memory usage when doing this:

std::vector<unsigned char> model_bytes;
std::ifstream file("model.onnx");
if (!file.eof() && !file.fail())
{
    file.seekg(0, std::ios_base::end);
    std::streampos fileSize = file.tellg();
    model_bytes.resize(fileSize);

    file.seekg(0, std::ios_base::beg);
    file.read(&model_bytes[0], fileSize);
}
session = std::make_shared<Ort::Session>(env, model_bytes.data(), model_bytes.size(), session_options);

rather than this:
session = std::make_shared<Ort::Session>(env, "/path/to/model/file.onnx", session_options);

Urgency

Not really urgent, just curious about this case as we want to load the models from memory eventually.

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

The text was updated successfully, but these errors were encountered:

cbourjau · 2024-06-25T16:12:06Z

Version 1.15.1 is rather old. Is this still an issue with the latest release?

ignogueiras · 2024-06-26T06:40:43Z

Hello again, I have just taked some memory profiles of different executions for each case.
This is the memory usage if I load the model from the file path directly:

And this one is loading from the vector of bytes:

The only change in the code is that I call:
session = std::make_shared<Ort::Session>(env, model_bytes.data(), model_bytes.size(), session_options);
instead of
session = std::make_shared<Ort::Session>(env, "/path/to/model.onnx", session_options);
The rest of the code is the same, I am still reading the file, creating the vector, etc, just calling the other constructor

As you suggested I tried it again with the newer 1.18.0 release, simply by replacing the release files in my deps folder, and while when loading from the file path I get the same behaviour, when loading from the vector it performs even worse:

cbourjau · 2024-06-26T08:33:54Z

That is a pretty sizable regression in terms of memory usage in any case! Was there a particular version between 1.15.1 and 1.18.0 that caused the even worse memory usage?

ignogueiras · 2024-06-26T10:38:39Z

Well, we jumped directly from the 1.15.1 we were using to the 1.18.0 for this test, but I just did a quick check and I can already see this increased memory usage with 1.16.1
I was unable to compile with 1.16.0 due to some missing headers in the includes/ folder, btw

ignogueiras · 2024-07-02T10:47:05Z

We have tested it with version 1.18.1, but it shows the same memory profile.

github-actions · 2024-08-01T15:00:56Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

cbourjau · 2024-08-01T17:31:53Z

Thanks for all the additional information @ignogueiras ! I'm afraid I don' have a good guess what the origin of your problem might be. But maybe you can try it again with the latest release from today?

ignogueiras · 2024-09-02T09:59:00Z

Hello again @cbourjau
Sorry for the late response, I was out of the office the last weeks.

I did some more tests today and I am starting to doubt my previous results as I am unable to reproduce they now. I keep seeing the same memory profile loading from bytes and from filepath. I am using a different machine right now, so could it be related to the different hardware?

I'll keep doing some more tests, maybe I am forgetting some steps of my old runs.

What I can still see is a regression with the last versions respect v1.15.1.

v1.15.1

v1.19.0

As you can see, the profiles have an almost identical form. First there is a resource loading and then some memory is released, approximately the size of the model. But in the new version, before this release, more memory is allocated, again the size of the model, negating the subsequent memory release.
It looks like some kind of copy of the model data was added there and it is not being freed afterwards.

github-actions bot added the quantization issues related to quantization label Jun 25, 2024

sophies927 added the performance issues related to performance regressions label Jun 27, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Aug 1, 2024

github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Increased memory usage when loading from bytes #21165

[Performance] Increased memory usage when loading from bytes #21165

ignogueiras commented Jun 25, 2024

cbourjau commented Jun 25, 2024

ignogueiras commented Jun 26, 2024

cbourjau commented Jun 26, 2024

ignogueiras commented Jun 26, 2024

ignogueiras commented Jul 2, 2024

github-actions bot commented Aug 1, 2024

cbourjau commented Aug 1, 2024

ignogueiras commented Sep 2, 2024

[Performance] Increased memory usage when loading from bytes #21165

[Performance] Increased memory usage when loading from bytes #21165

Comments

ignogueiras commented Jun 25, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

cbourjau commented Jun 25, 2024

ignogueiras commented Jun 26, 2024

cbourjau commented Jun 26, 2024

ignogueiras commented Jun 26, 2024

ignogueiras commented Jul 2, 2024

github-actions bot commented Aug 1, 2024

cbourjau commented Aug 1, 2024

ignogueiras commented Sep 2, 2024