Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.16] Instrumentation: fix log trace inconsistent status code with timeout check when writing the response (backport #15123) #15164

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Jan 7, 2025

Motivation/summary

This fix is branched from #15117.

This PR replaces TimeoutMiddleware with proactively checking for context timeout before writing the response result. The rationale behind this change is that request timeout only ever matters if it happened before the response was written. Otherwise, the client won't care about the response anyway and it's logical for the server to emit either of two error signals consistently in self instrumentation at this stage.

As a result of this changes depending if request timeout happened before or after response was written the self instrumentation will either emit original error transaction with an error log or timeout error transaction with an error log.

This is alternative to errors chaining PR #15122 which will preserve both error logs. Update: After some brief discussion we agreed to move forward with this option for the fix instead of more complex error chaining.

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

This PR includes a unit test that encapsulates the condition to simulate the issue. In order to reproduce against a real instance of APM Server follow the recipe from this comment #14232 (comment).

Related issues

#15122
#15117
Fixes #14232


This is an automatic backport of pull request #15123 done by Mergify.

…hen dealing with request timeout (#15123)

Replace TimeoutMiddleware with direct check for request cancelation when writing the response. To prevent events inconsistency in self instrumentation.

---------

Co-authored-by: Carson Ip <[email protected]>
Co-authored-by: Carson Ip <[email protected]>
(cherry picked from commit f2b3894)
@mergify mergify bot requested a review from a team as a code owner January 7, 2025 18:40
@mergify mergify bot added the backport label Jan 7, 2025
@mergify mergify bot assigned 1pkg Jan 7, 2025
@mergify mergify bot merged commit 9e9276d into 8.16 Jan 7, 2025
13 checks passed
@mergify mergify bot deleted the mergify/bp/8.16/pr-15123 branch January 7, 2025 19:14
@kruskall
Copy link
Member

tested locally with tilt by reducing the timeout and following the steps in the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants