Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix JNI wrapper memory leak #2178

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

gongsu832
Copy link
Collaborator

Fix JNI wrapper memory leak due to NewDirectByteBuffer not subjecting to Java GC. Use the Cleaner API requiring Java 9+ to avoid copying tensor data

to Java GC. Use the Cleaner API requiring Java 9+ to avoid copying
tensor data

Signed-off-by: Gong Su <[email protected]>
Signed-off-by: Gong Su <[email protected]>
(*env)->GetDirectBufferAddress(env, java_data), jni_data != NULL,
jecpt_cls, "jni_data=%p", jni_data);

free(jni_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into this during testing of my older finalize()-based PR, but jni_data points to the aligned allocation, not to the start of the allocation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are saying but I'm a bit surprised why this would happen. Basically, when you call NewDirectByteBuffer, you give it the pointer to the memory block. Now when you call GetDirectBufferAddress, it's going to give you a different (aligned) pointer? I mean I gave it the pointer, now I want it back, why would it change it behind my back?!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK maybe you mean onnx-mlir might give you an aligned pointer that's not the start of the allocation? OK in that case we need fiddle a bit like the offset thing you did. Itchy.

Copy link
Collaborator Author

@gongsu832 gongsu832 Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I looked at the conversion on your finalize PR and you said:

I thought about doing that but I wanted to avoid a limitation with java - all of the integral based arguments are signed. With an offset, which can be signed, I can keep the semantics more or less the same. With a Cleaner based approach I can create another DirectByteBuffer holding the actual allocation variable which the cleaner can use to clean up - no offset calculation required.

So I think instead of passing in the offset, we could just pass in the aligned pointer as jlong, right? As you said later, it doesn't really matter if it's signed or not. 64-bit is 64-bit. It only matters how you interpret it. Doing that would save us from having to keep another flag and doing the math.

@cjvolzka
Copy link
Collaborator

I tried this in our build container but I got errors trying to build onnx-mlir

src/com/ibm/onnxmlir/OMTensor.java:5: error: cannot find symbol
import java.lang.ref.Cleaner;
                    ^
  symbol:   class Cleaner
  location: package java.lang.ref
src/com/ibm/onnxmlir/OMTensor.java:113: error: cannot find symbol
    private static final Cleaner _cleaner = Cleaner.create();
                         ^
  symbol:   class Cleaner
  location: class OMTensor
src/com/ibm/onnxmlir/OMTensor.java:114: error: package Cleaner does not exist
    private final Cleaner.Cleanable _cleanable;
                         ^
src/com/ibm/onnxmlir/OMTensor.java:113: error: cannot find symbol
    private static final Cleaner _cleaner = Cleaner.create();
                                            ^
  symbol:   variable Cleaner
  location: class OMTensor

4 errors
gmake[2]: *** [src/Runtime/jni/CMakeFiles/javaruntime.dir/build.make:90: src/Runtime/jni/CMakeFiles/javaruntime.dir/java_compiled_javaruntime] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:5928: src/Runtime/jni/CMakeFiles/javaruntime.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....

I'm guessing it's because in our builder image we only install Java 1.8 since that's what we ultimately target running the models on. I can move that to Java 11. However, will that cause any issues then with the compiled model .jar files being compatible to run on Java 8?

@gongsu832
Copy link
Collaborator Author

I'm guessing it's because in our builder image we only install Java 1.8 since that's what we ultimately target running the models on. I can move that to Java 11. However, will that cause any issues then with the compiled model .jar files being compatible to run on Java 8?

I think you should keep your 1.8 and use finalize if that's what you are targeting. The finalize code is actually very simple. You can replace OMTensor.java with this.

OMTensor.java.txt

@cjvolzka
Copy link
Collaborator

I'm guessing it's because in our builder image we only install Java 1.8 since that's what we ultimately target running the models on. I can move that to Java 11. However, will that cause any issues then with the compiled model .jar files being compatible to run on Java 8?

I think you should keep your 1.8 and use finalize if that's what you are targeting. The finalize code is actually very simple. You can replace OMTensor.java with this.

OMTensor.java.txt

I'd greatly prefer to not have to force source code changes into our build that's not in the community. It creates tech debt and becomes a maintenance nightmare in the long term.

If that's the only way to do have the code "support" both, then I'd greatly lean towards either only supporting finalize() and not doing Cleaner until Java 1.8 support is no longer needed; or take the extra copy approach which avoids both finalize and Cleaner.

@gongsu832
Copy link
Collaborator Author

I'm guessing it's because in our builder image we only install Java 1.8 since that's what we ultimately target running the models on. I can move that to Java 11. However, will that cause any issues then with the compiled model .jar files being compatible to run on Java 8?

I think you should keep your 1.8 and use finalize if that's what you are targeting. The finalize code is actually very simple. You can replace OMTensor.java with this.
OMTensor.java.txt

I'd greatly prefer to not have to force source code changes into our build that's not in the community. It creates tech debt and becomes a maintenance nightmare in the long term.

If that's the only way to do have the code "support" both, then I'd greatly lean towards either only supporting finalize() and not doing Cleaner until Java 1.8 support is no longer needed; or take the extra copy approach which avoids both finalize and Cleaner.

I thought you just want to do a quick test on 1.8 yourself. That's certainly not the only way to support both. We can have cmake pick the right version depending on which Java version you are using.

As I mentioned in another thread, finalize does not work on Java 11 (compile gives warning, runtime has no effect) so we can't really have only finalize.

@AlexandreEichenberger
Copy link
Collaborator

@gongsu832 do you need help from MS folks for the Window failure; or maybe just re-update to the latest and see if this now works?

@AlexandreEichenberger
Copy link
Collaborator

@gongsu832 @cjvolzka is there a resolution for this PR?

@cjvolzka
Copy link
Collaborator

A fix was already merged to address this issue in a way that works with Java 1.8. I believe this PR is to replace that with something that aligns better with later Java 17 standards. However, one of our users (and as far as I know they are the only/primary user of the Java interface) will still need Java 1.8 support for at least another year, if not two. Given the long time before this could merge, I'd say it can be closed, but I leave that up to @gongsu832

@gongsu832 gongsu832 marked this pull request as draft June 14, 2023 15:28
@gongsu832
Copy link
Collaborator Author

I'd like to keep it open just so we don't forget and also I can rebase the patch to the latest code. So I converted it draft.

@AlexandreEichenberger
Copy link
Collaborator

@gongsu832 Still desirable? If so can we reactualizes it? Or should we close?

@gongsu832
Copy link
Collaborator Author

gongsu832 commented Sep 13, 2024

@gongsu832 Still desirable? If so can we reactualizes it? Or should we close?

This PR serves as a reminder that when Java 1.8 goes out of service we should revisit the issue and converge on a single and simpler direct byte buffer cleaner. So I'd keep it open unless there is some problem with doing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants