Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Kill Query for Non-Transaction Query Execution and Update Query Timeout / Cancelled Error Message #15694

Merged
merged 9 commits into from
Apr 23, 2024

Conversation

harshit-gangal
Copy link
Member

@harshit-gangal harshit-gangal commented Apr 10, 2024

Description

This PR changes the error message returned on the context error.
On context timeout: ERROR 3024 (HY000): Query execution was interrupted, maximum statement execution time exceeded
On context cancelled: ERROR 1317 (70100): Query execution was interrupted

This PR also executed kill query instead of kill connection on context error to reduce the connection churn. This is only performed on non-transactional queries.

Related Issue(s)

Checklist

  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

@harshit-gangal harshit-gangal added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving labels Apr 10, 2024
Copy link
Contributor

vitess-bot bot commented Apr 10, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Apr 10, 2024
@github-actions github-actions bot added this to the v20.0.0 milestone Apr 10, 2024
@harshit-gangal harshit-gangal force-pushed the dbconn-kill branch 3 times, most recently from 9bee3bd to 7843728 Compare April 12, 2024 06:42
@harshit-gangal harshit-gangal removed NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work labels Apr 12, 2024
Copy link

codecov bot commented Apr 12, 2024

Codecov Report

Attention: Patch coverage is 89.53488% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 68.42%. Comparing base (f118ba2) to head (bd27474).
Report is 22 commits behind head on main.

Files Patch % Lines
go/vt/vttablet/tabletserver/connpool/dbconn.go 94.80% 4 Missing ⚠️
go/vt/vttablet/onlineddl/executor.go 0.00% 3 Missing ⚠️
go/vt/vttablet/tabletserver/query_list.go 66.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15694      +/-   ##
==========================================
+ Coverage   68.40%   68.42%   +0.01%     
==========================================
  Files        1556     1556              
  Lines      195121   195490     +369     
==========================================
+ Hits       133479   133764     +285     
- Misses      61642    61726      +84     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@harshit-gangal harshit-gangal force-pushed the dbconn-kill branch 3 times, most recently from 15b76f1 to a43cb4c Compare April 18, 2024 14:47
Signed-off-by: Harshit Gangal <[email protected]>
@harshit-gangal harshit-gangal marked this pull request as ready for review April 19, 2024 07:53
@harshit-gangal harshit-gangal requested a review from systay as a code owner April 19, 2024 07:53
@harshit-gangal harshit-gangal changed the title Use of Kill Query instead of Kill Connection on Timeouts Use Kill Query for Non-Transaction Query Execution and Update Query Timeout / Cancelled Error Message Apr 19, 2024
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some minor nits/comments/suggestions. I'll approve and you can address those as you feel is best. Thanks!

go/vt/vttablet/onlineddl/executor.go Outdated Show resolved Hide resolved
Comment on lines +215 to +218
// we can't safely kill a query in a transaction, we need to kill the connection
_ = dbc.Kill(errMsg, time.Since(now))
} else {
_ = dbc.KillQuery(errMsg, time.Since(now))
Copy link
Contributor

@mattlord mattlord Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that we attempt to kill the connection/query. The comment says that we do, but w/o checking the error we don't know if it was successful do we? IMO we should return the error here and check it at the call sites:

                return dbc.Kill(errMsg, time.Since(now))
	} else {
		return dbc.KillQuery(errMsg, time.Since(now))

It looks like dbc.KillWithContext() can fail for a number of reasons and we shouldn't assume the kill succeeded, should we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If for any reason kill fails, we log the error message. Kill itself is not directly in the query execution path. But, it kills any other running query, which will receive the error message if the kill succeeds.

go/vt/vttablet/tabletserver/connpool/dbconn.go Outdated Show resolved Hide resolved
sql := fmt.Sprintf("kill query %d", dbc.conn.ID())
go func() {
_, err := killConn.Conn.ExecuteFetch(sql, -1, false)
ch <- err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not close the channel here as we did elsewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it in exec and streamexec, there is no reason we cannot add it here.
Added now.

}()
testContextError(t, ctx, exec,
"(errno 1317) (sqlstate 70100): Query execution was interrupted",
150*time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this test is likely to be flaky in the CI, given these very small time windows. I'd recommend extending all of them at least a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run these tests multiple times and have not observed any flakiness. I will increase them if they come out as flaky.

Signed-off-by: Harshit Gangal <[email protected]>
@harshit-gangal harshit-gangal merged commit 5e2a873 into vitessio:main Apr 23, 2024
104 checks passed
@harshit-gangal harshit-gangal deleted the dbconn-kill branch April 23, 2024 08:34
@shlomi-noach
Copy link
Contributor

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential connection churn caused by DBConn.Kill()
5 participants