Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Increased memory usage when loading from bytes #21165

Open
ignogueiras opened this issue Jun 25, 2024 · 8 comments
Open

[Performance] Increased memory usage when loading from bytes #21165

ignogueiras opened this issue Jun 25, 2024 · 8 comments
Labels
performance issues related to performance regressions quantization issues related to quantization

Comments

@ignogueiras
Copy link

Describe the issue

Until now we were creating our Ort::Session object by passing it the path of our model (.onnx file).
Now we are trying to create the Session object from the bytes already read in a std::vector. Although everything seems to work correctly, we have detected a higher memory consumption, approximately the size of the model.
We are reasonably sure that the vector is being released correctly, so we have the impression that creating the Session is making a copy that is not being released.
Is this expected? Or are we doing something incorrectly?

To reproduce

We observe a much bigger memory usage when doing this:

std::vector<unsigned char> model_bytes;
std::ifstream file("model.onnx");
if (!file.eof() && !file.fail())
{
    file.seekg(0, std::ios_base::end);
    std::streampos fileSize = file.tellg();
    model_bytes.resize(fileSize);

    file.seekg(0, std::ios_base::beg);
    file.read(&model_bytes[0], fileSize);
}
session = std::make_shared<Ort::Session>(env, model_bytes.data(), model_bytes.size(), session_options);

rather than this:
session = std::make_shared<Ort::Session>(env, "/path/to/model/file.onnx", session_options);

Urgency

Not really urgent, just curious about this case as we want to load the models from memory eventually.

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

@github-actions github-actions bot added the quantization issues related to quantization label Jun 25, 2024
@cbourjau
Copy link
Contributor

Version 1.15.1 is rather old. Is this still an issue with the latest release?

@ignogueiras
Copy link
Author

Hello again, I have just taked some memory profiles of different executions for each case.
This is the memory usage if I load the model from the file path directly:
load_file_1 15 1

And this one is loading from the vector of bytes:
load_vector_1 15 1

The only change in the code is that I call:
session = std::make_shared<Ort::Session>(env, model_bytes.data(), model_bytes.size(), session_options);
instead of
session = std::make_shared<Ort::Session>(env, "/path/to/model.onnx", session_options);
The rest of the code is the same, I am still reading the file, creating the vector, etc, just calling the other constructor

As you suggested I tried it again with the newer 1.18.0 release, simply by replacing the release files in my deps folder, and while when loading from the file path I get the same behaviour, when loading from the vector it performs even worse:
load_vector_1 18 0

@cbourjau
Copy link
Contributor

That is a pretty sizable regression in terms of memory usage in any case! Was there a particular version between 1.15.1 and 1.18.0 that caused the even worse memory usage?

@ignogueiras
Copy link
Author

Well, we jumped directly from the 1.15.1 we were using to the 1.18.0 for this test, but I just did a quick check and I can already see this increased memory usage with 1.16.1
I was unable to compile with 1.16.0 due to some missing headers in the includes/ folder, btw

@sophies927 sophies927 added the performance issues related to performance regressions label Jun 27, 2024
@ignogueiras
Copy link
Author

We have tested it with version 1.18.1, but it shows the same memory profile.

Copy link
Contributor

github-actions bot commented Aug 1, 2024

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Aug 1, 2024
@cbourjau
Copy link
Contributor

cbourjau commented Aug 1, 2024

Thanks for all the additional information @ignogueiras ! I'm afraid I don' have a good guess what the origin of your problem might be. But maybe you can try it again with the latest release from today?

@github-actions github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Aug 2, 2024
@ignogueiras
Copy link
Author

Hello again @cbourjau
Sorry for the late response, I was out of the office the last weeks.

I did some more tests today and I am starting to doubt my previous results as I am unable to reproduce they now. I keep seeing the same memory profile loading from bytes and from filepath. I am using a different machine right now, so could it be related to the different hardware?

I'll keep doing some more tests, maybe I am forgetting some steps of my old runs.

What I can still see is a regression with the last versions respect v1.15.1.

v1.15.1
load_file_v1 15 1

v1.19.0
load_file_v1 19 0

As you can see, the profiles have an almost identical form. First there is a resource loading and then some memory is released, approximately the size of the model. But in the new version, before this release, more memory is allocated, again the size of the model, negating the subsequent memory release.
It looks like some kind of copy of the model data was added there and it is not being freed afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance issues related to performance regressions quantization issues related to quantization
Projects
None yet
Development

No branches or pull requests

3 participants