-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
False positive error 500 by java agent #3454
Comments
Thanks for reporting this @metabsd. This Here we need to have a bit of clarification:
What is likely happening here is that a transaction is captured by the agent and the HTTP client might be closing the connection earlier than expected, then on the server side, a If that's what I think, the challenge here would be to make the agent aware of those "expected exceptions" if we can, and another challenge might be that this kind of exception could also be the symptom of a misbehaving HTTP client, thus it's quite hard to always mark those as "not really errors". Currently the agent assumes that when there is an exception the HTTP response code is a 500, but here it has already been sent and it might actually be fine even from the client-side, but that's definitely an implementation detail of the Tomcat server (thus might need a dedicated instrumentation if we need to handle that automatically). We already had a similar report #979. |
Hi @SylvainJuge, I'm reaching out to seek your insights on an issue we're encountering with the interpretation of HTTP 500 errors by the agent. We have observed something curious: when using Postman for our requests, we receive a HTTP 200 response, which indicates that everything is functioning correctly. This observation is corroborated by our JMeter load tests, where 100% of the requests returned without any errors. Given this context, could the agent be misinterpreting the normal termination of threads in a multi-threaded environment as errors? We suspect that the agent might be attaching itself to a thread and then mistakenly interpreting its natural termination as an error. Is it possible that this is causing the discrepancy we are observing? Also, I'm curious about the Java agent's design in relation to multi-threaded environments. Is the agent typically configured to handle the nuances of thread lifecycle management in such environments, particularly without misidentifying routine thread terminations as errors? Could you shed some light on how the agent might be handling thread lifecycles and error detection, and whether there might be a need for further investigation into this aspect to resolve the apparent discrepancy between the agent's error logs and the results from testing tools like Postman and JMeter? Thank you in advance for your guidance on this matter. |
Hi, From your description it seems that the application behavior is not altered, and only what the agent reports is not what you'd expect. You mentioned JMeter, which means you are probably load-testing the application, that might also explain those unexpected The agent is not attached to the thread lifecycle, in the sense that the fact a thread is terminated or started does not automatically create or stops an existing span/transaction. The agent usually instruments one or more methods and wrap their content in a big try/catch/finally block, that allows to define the scope of transactions and spans as the time it takes to execute the method body of the instrumented methods. When an exception occurs, like when the thread is interrupted (but it could also be any kind of exception), then the catch/finally part of the instrumentation is executed, and this is where the agent tries to guess the status of the HTTP response as it's not already explicitly set. So here I would first suggest to validate if you can reproduce when using the application like a human would (not JMeter), then try the |
Hi, Yes, we are currently conducting load tests. I have also reviewed our APM logs since November 21, and this type of error was already present. We also encounter HTTP 500 errors when making calls with Postman without JMeter in action. Thank you for the recommendation regarding the ignore-exceptions parameter, I will try that out. I have come to the conclusion that the exception might be misinterpreted by the Java agent because, in the transaction details, I noticed that the activation_methods of the agent is programmatic-self-attach. This leads me to believe that the agent attaches itself to a thread. Then, in the exception mentioned by the agent, I noticed java.io.IOException: The current thread was interrupted. I am going to ask the Devs to evaluate this in depth to make sure we are not overlooking a real problem. Once again, thank you!! |
The "activation method" refers to how the agent is setup in the JVM, in the case of If this exception also can be triggered through postman, then the question is to see if this issue is visible in the usual tomcat logs:
|
We have found references in the code to |
What do you mean here, in the code of the application or in the code of the agent ? Also do you know if those references are the ones that trigger this behavior or not ? To me the relevant question that needs to be answered is more "why is the thread being interrupted while processing and HTTP request/response". |
Leaving this open for a little longer to allow an update, but without any further updates here, will likely close this soon |
Source of the Problem: The issue we're discussing is indeed in our application's code, not in the Java agent code. This clarification is important as it narrows down our focus to the internal workings of our own application for troubleshooting and fixing the issue. |
So are you okay to close this issue? |
Describe the bug
We are experiencing a 100% fail rate on Java microservices, but the service is working as expected. We believe that the APM agent is misinterpreting the termination of a thread during execution. Maybe we need to adjust things on our end, but everything seems to be functioning well.
I have noticed that the agent attaches to the thread, and I believe that at the end of processing, the thread is killed. Therefore, the agent misinterprets this and thinks there is a problem.
agent.activation_method
==programmatic-self-attach
Exception message:
java.io.IOException: The current thread was interrupted
Full stack trace:
Steps to reproduce
We are making a call to our web service, and the call is normal, and we receive our response as expected. However, the result of the call in APM seems to be problematic, but we do not see any issue with this call, only from the APM perspective.
Expected behavior
That APM does not return an HTTP 500 error for the requests in question.
The text was updated successfully, but these errors were encountered: