-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] JNI refactor for OrtJniUtil #12516
Conversation
after a sequence of changes, the exception handling is getting correct. however, before closing the issue we need to add some test cases ( which intentionally trigger exceptions ) and we need to check if the exception handling is working as expected. As we are doing changes in different PRs (#12013, #12281, #12496 and this one). You can do it separately. but we need the tests to make sure the code changes work and also to ensure it is not broken by future changes. |
I would do if I could replicate the issue in Java, but I can't. When I call There are already tests which supply the wrong type or number of inputs into I ran a quick test in jshell on both Linux and macOS x86_64 using import ai.onnxruntime.*;
var env = OrtEnvironment.getEnvironment();
var session = env.createSession("<path-to-an-mnist-cnn-model>");
//jshell> session.getInputInfo()
//==> {input_image=NodeInfo(name=input_image,info=TensorInfo(javaType=FLOAT,onnxType=ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT,shape=[-1, 1, 28, 28]))}
float[] input = new float[56];
var tensor = OnnxTensor.createTensor(env,input);
var outputs = session.run(Map.of("input_image",tensor)); I get the expected exception:
and if I change the tensor so it's the right rank, but still has 56 elements rather than 784 I get:
again, as expected. |
Could you please help to add test cases in react_native\android\src\androidTest\java\ai\onnxruntime\reactnative\OnnxruntimeModuleTest.java ? If you don't have the Android environment, we can leverage the CI to test it. |
Sure. The Java code in there looks a little odd, is there some guidance on which bits of the JDK I can expect to work? It looks like it's using different |
There are some types import from React Native data bridge: import com.facebook.react.bridge.Arguments;
import com.facebook.react.bridge.JavaOnlyArray;
import com.facebook.react.bridge.JavaOnlyMap;
import com.facebook.react.bridge.ReactApplicationContext;
import com.facebook.react.bridge.ReadableArray;
import com.facebook.react.bridge.ReadableMap; Those are defined in react native for inter-op between Java and JavaScript. I am using OpenJDK 11 + Android Studio Bumblebee |
Ok, once this has been merged in I'll work up a PR to mirror the existing exception tests over to react, and add a new test checking specifically for input size & shape to both Java & react. I might need some help with the react one though, but we can discuss that in that PR. |
I made a PR with some new tests for the react-native portion #12659. |
2e5df6d
to
15a1044
Compare
@yuslepukhin please could you review this PR? It's the last one to finish off the JNI refactor. |
java/src/main/native/OrtJniUtil.c
Outdated
// length + 1 as we need to write out the final offset | ||
size_t * offsets; | ||
checkOrtStatus(jniEnv,api,api->AllocatorAlloc(allocator,sizeof(size_t)*(length+1),(void**)&offsets)); | ||
OrtErrorCode copyStringTensorToArray(JNIEnv *jniEnv, const OrtApi * api, OrtAllocator* allocator, OrtValue* tensor, size_t length, jobjectArray outputArray) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
/azp run MacOS CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-python-checks-ci-pipeline |
Azure Pipelines successfully started running 6 pipeline(s). |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline |
Azure Pipelines successfully started running 6 pipeline(s). |
/azp run orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 4 pipeline(s). |
/azp run onnxruntime-binary-size-checks-ci-pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux Nuphar CI Pipeline,Linux OpenVINO CI Pipeline |
/azp run MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-python-checks-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline |
Azure Pipelines successfully started running 7 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
I can replicate that Windows failure locally, I get an error out of the JVM saying |
Our default CPU allocator uses aligned |
Found it. In two places I was using Fixing that makes the crash disappear on my Windows box. |
It would be wrong everywhere. One needs to destroy the objects first and only then deallocate memory. Simply deallocating memory for |
I agree it was wrong everywhere, I'm surprised it didn't crash on the other platforms too. I've checked through the code and I don't think I'm using it incorrectly elsewhere. |
/azp run Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux Nuphar CI Pipeline,Linux OpenVINO CI Pipeline |
/azp run MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-python-checks-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline |
Azure Pipelines successfully started running 7 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
That TensorRT test failure looks independent of all the changes in this PR, it's in the core library tests rather than any of the Java ones. |
I will see to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description:
Following on from #12013, #12281 and #12496 this PR fixes the JNI error handling in OrtJniUtil. The refactor of all the JNI code should be complete now. I'll revise the sparse tensor PR (#10653) after this has been merged as it touches many of the same parts of the code.
This change is independent of #12496.
Motivation and Context