[Performance] Java API lacks functionality to control allocator settings. #18845

ivanthewebber · 2023-12-15T20:54:53Z

Describe the issue

The Java API is very limited with no way to control the arena allocator settings (e.g. "arena_extend_strategy" to "kSameAsRequested", "max_mem", "max_dead_bytes_per_chunk", "initial_chunk_size_bytes").

This of course means that memory will be wasted, and startup cannot be optimized. Also, if there is a memory leak it will OOMKilled the entire container instead of producing a reasonable error message (as it should with reasonable max_mem).

I've tried looking for any way to configure it but found nothing. It seems like it would be really easy to forward some configurations to the underlying C-implementation.

To reproduce

Use the Java API.

Urgency

It's causing problems for me at work.

Platform

Linux

OS Version

AKS Docker image based Mariner image

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.2

ONNX Runtime API

Java

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

ivanthewebber · 2023-12-15T20:58:37Z

I'm trying to use the ONNXRUNTIME for stream processing with Apache Flink in a low-latency, high-throughput, and memory-constrained setting.

See this paper comparing Onnx and alternatives for this use case; with source code which is similar to my own usage.

Craigacp · 2023-12-15T21:33:16Z

I think a bunch of those are possible for CUDA as we expose an add method to the CUDA EP options, but you're right we don't expose memory allocators at all for CPUs.

It's not straightforward to design an API which exposes the allocators, at the moment there's a single default allocator used everywhere and it's not exposed in any of the value construction methods so it would be a substantial effort to build an API around that, OrtMemoryInfo and OrtArenaCfg. It's on the todo list as it will enable direct allocation of GPU memory which can be useful, but needs careful designing.

ivanthewebber · 2023-12-18T20:14:46Z

It seems like you could follow the same patterns as the Python API and just translate some of the implementation. Let me know if you're able to add this to your backlog and the timeline. Otherwise I will be looking for a workaround or an alternative like onnx-scala.

Craigacp · 2023-12-18T20:46:25Z

Python is a little easier as it doesn't have to deal with concurrency, so they can get away with a laxer API. I'll scope out the amount of work in the new year.

github-actions · 2024-01-18T15:01:06Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

Craigacp · 2024-01-18T15:41:16Z

Keep this issue open, it can track CPU allocator settings.

ivanthewebber · 2024-01-24T16:42:40Z

Any updates? Also, if I set the number of inter-op and intra-op threads to 1 and share a session object across many threads would each thread calling run be able to run in parallel or would the affinity of the ONNX thread be tied to a single CPU?

Craigacp · 2024-01-24T19:39:20Z

No updates, I'm waiting for this PR (#18556) to be merged before starting on more memory management related issues.

I believe the thread you send in to ORT is used for compute, so if you have concurrent requesting threads then those threads will concurrently execute the model.

ivanthewebber · 2024-04-22T23:36:55Z

Any updates? I have my fingers crossed that some work on this will get planned

Craigacp · 2024-04-22T23:48:10Z

Not yet.

github-actions bot added the api:Java issues related to the Java API label Dec 15, 2023

EmergentOrder mentioned this issue Dec 19, 2023

Quick Questions EmergentOrder/onnx-scala#449

Closed

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jan 18, 2024

github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Jan 21, 2024

balenamiaa mentioned this issue Jun 7, 2024

[JAVA] Ability to construct a Tensor from a GPU memory pointer #20966

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Java API lacks functionality to control allocator settings. #18845

[Performance] Java API lacks functionality to control allocator settings. #18845

ivanthewebber commented Dec 15, 2023

ivanthewebber commented Dec 15, 2023

Craigacp commented Dec 15, 2023

ivanthewebber commented Dec 18, 2023 •

edited

Loading

Craigacp commented Dec 18, 2023

github-actions bot commented Jan 18, 2024

Craigacp commented Jan 18, 2024

ivanthewebber commented Jan 24, 2024

Craigacp commented Jan 24, 2024

ivanthewebber commented Apr 22, 2024

Craigacp commented Apr 22, 2024

[Performance] Java API lacks functionality to control allocator settings. #18845

[Performance] Java API lacks functionality to control allocator settings. #18845

Comments

ivanthewebber commented Dec 15, 2023

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

ivanthewebber commented Dec 15, 2023

Craigacp commented Dec 15, 2023

ivanthewebber commented Dec 18, 2023 • edited Loading

Craigacp commented Dec 18, 2023

github-actions bot commented Jan 18, 2024

Craigacp commented Jan 18, 2024

ivanthewebber commented Jan 24, 2024

Craigacp commented Jan 24, 2024

ivanthewebber commented Apr 22, 2024

Craigacp commented Apr 22, 2024

ivanthewebber commented Dec 18, 2023 •

edited

Loading