Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onnxruntime Memory [Web] #18165

Open
prithivi1 opened this issue Oct 30, 2023 · 11 comments
Open

Onnxruntime Memory [Web] #18165

prithivi1 opened this issue Oct 30, 2023 · 11 comments
Labels
platform:web issues related to ONNX Runtime web; typically submitted using template stale issues that have not been addressed in a while; categorized by a bot

Comments

@prithivi1
Copy link

prithivi1 commented Oct 30, 2023

Describe the issue

While my Onnx model functions excellently in Onnxruntime Web, I've encountered an issue where creating an InferenceSession results in a substantial memory usage increase of approximately 300 MB, and this memory is not released. I'm monitoring this memory usage through my Chrome task manager.

This behavior persists even with a simple Onnx model that's just 2 KB in size. I'm curious about the reason behind this memory increase and looking for a way to clear this memory or release the session object once I've completed my predictions.

To reproduce

<!DOCTYPE html>
<html>
<head>
    <title>Onnx Example</title>
    <button id="runModelButton">Run Model</button>
</head>
<body>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/onnxruntime-web/1.15.1/ort.min.js"></script>
    
    <script>
        document.getElementById('runModelButton').addEventListener('click', async () => {
            load_predict();
        });

        async function load_predict(){
            let onnx_path = "../onnx/random_forest_model.onnx";
            let session1 = await ort.InferenceSession.create(onnx_path);
            console.log("Model Loaded ::: ",session1)
            inputs = await new ort.Tensor('float32', new Float32Array([0, 395, 67, 0, 2, 96, 10]), dims=[1,7]);
            let outputMap = await session1.run({'X': inputs}, ['probabilities']);
            let predictions = outputMap['probabilities'].data;
            console.log('Predictions', predictions)
            return null;
        }
    </script>
</body>
</html>

Urgency

High Priority. Need to fix this issue!

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.15.1

Execution Provider

'wasm'/'cpu' (WebAssembly CPU)

@prithivi1 prithivi1 added the platform:web issues related to ONNX Runtime web; typically submitted using template label Oct 30, 2023
@carzh
Copy link
Contributor

carzh commented Oct 31, 2023

Hello, there is an InferenceSession release method which you can call once you've completed the predictions.

@prithivi1
Copy link
Author

prithivi1 commented Oct 31, 2023

Hi @carzh,
Thanks for the suggestion. I have tried this release method. But it doesn't release any memory.

Also the entire 300 MB is occupied during session creation in wasm.

I tried to run memory profile for this process and my overall memory profile size is just around 20 MB which is same as my ONNX model size (20 MB).

When I list my chrome task manager, I could see this tab occupies >300 MB of memory. So, I debugged the code and found that the memory shooted up at e._OrtCreateSession = (a,b,c)=>(e._OrtCreateSession = I.T)(a, b, c); in ort.js file.

@carzh
Copy link
Contributor

carzh commented Oct 31, 2023

That's strange -- does the issue persist when using ORT 1.16.1?

@prithivi1
Copy link
Author

Yes, This happens in ORT 1.16.1 too.

Screenshot 2023-11-01 at 11 11 57 AM

@carzh
Copy link
Contributor

carzh commented Nov 4, 2023

Cc'ing @fs-eire and @guschmue

@fs-eire
Copy link
Contributor

fs-eire commented Nov 6, 2023

If this issue happens inside "_OrtCreateSession", it indicates that the memory growth happens inside ONNX Runtime model initialization step.

ONNX Runtime does a lot of things in the initialization steps - loading the model graph, applying graph optimizers and transformers, allocating tensors, prepacking weights and initializing kernels.

There might be a few possible ways to reduce the memory consumption:

  • disable memory arena and memory pattern
  • disable a part of the graph optimizations ( some uses more memory to get faster inference )
  • avoid using some traditional ML operators (eg. ZipMap)

@prithivi1
Copy link
Author

Hello @fs-eire,

I appreciate your valuable suggestions, but unfortunately, none of the steps I tried seemed to reduce the memory usage. Additionally, I encountered an issue while making predictions without using ZipMap,

ort.js:17884 Uncaught (in promise) Error: Can't access output tensor data on index 1. ERROR_CODE: 9, ERROR_MESSAGE: Reading data from non-tensor typed value is not supported.

I'm seeking further guidance to address this problem.

@fs-eire
Copy link
Contributor

fs-eire commented Nov 11, 2023

could you share your 2KB model? (which consumes 300MB memory at runtime)

Copy link
Contributor

github-actions bot commented Jan 3, 2024

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jan 3, 2024
@prithivi1
Copy link
Author

Hi @fs-eire ,
I just checked from my end. And the memory usage is high only on Random Forest Models.

But the memory is not released after the prediction in any model. I have also tried to remove the model instance in JS. Is there a way to free up this memory once my prediction process is complete

@fs-eire
Copy link
Contributor

fs-eire commented Jan 24, 2024

WebAssembly memory cannot shrink. This means the wasm memory will keep the size of the peak memory usage, even if the code marks some part as "free".

Regarding removing "ZipMap", I am not an expert on ONNX exporter but I assume there are a few options the control the model export with no ZipMap operator and also with no non-tensor inputs/outputs. You can try to find if there are such options in the exporter that you use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:web issues related to ONNX Runtime web; typically submitted using template stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

3 participants