Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

void dumpHeap(String, boolean) is called via CallObjectMethodA and not CallVoidMethodA #18

Open
schmelter-sap opened this issue Jan 26, 2021 · 5 comments

Comments

@schmelter-sap
Copy link

Hi,

I'm currently investigating a case where the heap dump was not written by the jvmkill agent. The problem is that the dumpHeap() method fails and I get an error of the form:

HeapDump action failed: JNI call failed: call to method_id 0x0... on object 0x0... with variable arguments "<path>", 1 failed

First I though that this would be caused by an exception during the call to dumpHeap(), but then there should be the exception message included too. But as I inspected the code more closely, I found that the void method dumpHeap() is called via CallObjectMethodA, which is reserved for method which return objects:

jni_env.call_object_method_with_cstring_jboolean(hotspot_diagnostic_mxbean, dump_heap_method_id, resolved_heap_dump_path_cstring.clone(), ::jvmti::JNI_TRUE as u8)?;

And since you check if the 'result' of the CallObjectMethodA call is null (and treat that as an error), the error may be just caused by the random value in the register used to return values?

Best regards,
Ralf

@dmikusa
Copy link

dmikusa commented Mar 16, 2021

@schmelter-sap Do you have more context you can share here? A couple of questions come to mind...

  1. Can you include the JVM start-up command used, in particular, the parameters passed to the jvmkill agent.

  2. What is the version of the jvmkill agent you're using? What is the vendor & JVM version?

  3. How often are you seeing this happen? Was it a one-off occurrence, or do you see it continue to occur? If so, at what frequency? Do you have any metrics from the container before it's failing? Like how big the heap? Where's the heap being written to? Is there enough free disk space in that location to store the heap?

I hear what you're saying here...

But as I inspected the code more closely, I found that the void method dumpHeap() is called via CallObjectMethodA, which is reserved for method which return objects:

and here

And since you check if the 'result' of the CallObjectMethodA call is null (and treat that as an error), the error may be just caused by the random value in the register used to return values?

It seems plausible. If I can get the info above, I'll try to dig in a little more & reproduce. If I can't reproduce, perhaps we could look at adding a bit more logging to capture more details about what is happening during the failure.

@schmelter-sap
Copy link
Author

Hi,

as can be seen in

let dump_heap_method_id = jni_env.get_method_id(hotspot_diagnostic_mxbean_class, "dumpHeap", "(Ljava/lang/String;Z)V")?;
the signature of the method is "(Ljava/lang/String;Z)V", so you have to call it with a CallVoidMethod* variant as per the JNI spec (https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html#wp16656). I don't think there can be any discussion that the current code is wrong and the fix is to use the correct JNI call.

The problem itself I've only seen once, so it would be hard to reproduce it. The version was jvmkill-1.16.0.RELEASE-trusty.

Best regards,
Ralf

@dmikusa
Copy link

dmikusa commented May 20, 2021

We can certainly fix that call so it's following the spec. Hopefully, that will prevent the issue from occurring again. Not being able to reproduce it, we can't really guarantee anything beyond making the code follow the spec more closely.

Thanks for the follow-up and additional details.

@ansteiner
Copy link

Hi @dmikusa-pivotal,

as we see this more often in CF@SAP, is there a date when this will be fixed?

Kind regards,
Andreas

@dmikusa
Copy link

dmikusa commented Mar 22, 2022

We don't have this scheduled at the moment. The code changes are small but CI for this project is not functional, so testing and cutting a release will likely require a lot of effort.

The JVM kill agent isn't really necessary anymore either. The JVM has flags for most of its functionality, like terminating on OOME and writing heap dumps. I'd be curious to get your thoughts on deprecating JVM kill agent eventually removing it from the buildpack. The cloud-native Java buildpacks stopped using jvmkill agent a while ago and it has been a smooth transition.

Roughly speaking a plan like:

  1. Add support to opt-in to using new JVM flags instead of jvmkill agent. Wait a while.
  2. Default to using JVM flags instead of jvmkill agent, but allow reverting to use the agent. Wait a while.
  3. Remove the jvmkill agent and put this project in the attic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants