Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TabletServer: Handle nil targets properly everywhere #14734

Merged
merged 10 commits into from
Dec 9, 2023

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Dec 8, 2023

Description

When executing queries locally using the local TabletServer, you use a tabletenv.LocalContext: https://github.com/vitessio/vitess/blob/main/go/vt/vttablet/tabletserver/tabletenv/local_context.go

And when using this the target tablet is nil (or at least it can be) — otherwise an error is returned. For example:

func (sm *stateManager) verifyTargetLocked(ctx context.Context, target *querypb.Target) error {
if target != nil {
switch {
case target.Keyspace != sm.target.Keyspace:
return vterrors.Errorf(vtrpcpb.Code_INVALID_ARGUMENT, "invalid keyspace %v does not match expected %v", target.Keyspace, sm.target.Keyspace)
case target.Shard != sm.target.Shard:
return vterrors.Errorf(vtrpcpb.Code_INVALID_ARGUMENT, "invalid shard %v does not match expected %v", target.Shard, sm.target.Shard)
case target.TabletType != sm.target.TabletType:
for _, otherType := range sm.alsoAllow {
if target.TabletType == otherType {
return nil
}
}
return vterrors.Errorf(vtrpcpb.Code_FAILED_PRECONDITION, "%s: %v, want: %v or %v", vterrors.WrongTablet, target.TabletType, sm.target.TabletType, sm.alsoAllow)
}
} else {
if !tabletenv.IsLocalContext(ctx) {
return vterrors.Errorf(vtrpcpb.Code_INVALID_ARGUMENT, "No target")
}
}
return nil
}

You can see all checks for it here: https://github.com/search?q=repo%3Avitessio%2Fvitess%20IsLocalContext&type=code

And you can see the uses of it here (all the locations impacted by the bug): https://github.com/search?q=repo%3Avitessio%2Fvitess%20tabletenv.LocalContext&type=code

This means that in the TabletServer code we can never assume that the target is not nil and we need to check for it. We added some new places where we use that variable in #13521 for stats, but we didn't add a nil check. So this PR adds that along with a unit test.

The new unit test fails on main:

❯ go test -timeout 30s -run ^TestTabletServerWithNilTarget$ vitess.io/vitess/go/vt/vttablet/tabletserver
E1208 11:31:06.168089   61524 server.go:438] Query not found: set @@session.sql_mode='ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'
E1208 11:31:06.196732   61524 throttler.go:477] Throttler.retryReadAndApplyThrottlerConfig(): error reading throttler config. Will retry in 10s. Err=node doesn't exist: keyspaces/SrvKeyspace
E1208 11:31:06.200858   61524 server.go:438] Query not found: show full tables like '\_vt\_%'
E1208 11:31:06.200922   61524 tablegc.go:281] TableGC: error while reading tables: unknown error: fakesqldb:: query: 'show full tables like '\_vt\_%'' is not supported on fakesqldb (errno 1105) (sqlstate HY000) during query: show full tables like '\_vt\_%'
E1208 11:31:06.202038   61524 tabletserver.go:1560] Uncaught panic for Sql: "begin", BindVars: {}:
runtime error: invalid memory address or nil pointer dereference
/usr/local/go/src/runtime/panic.go:261 (0x104ba71bb)
	panicmem: panic(memoryError)
/usr/local/go/src/runtime/signal_unix.go

...

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Copy link
Contributor

vitess-bot bot commented Dec 8, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Dec 8, 2023
@github-actions github-actions bot added this to the v19.0.0 milestone Dec 8, 2023
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
@mattlord mattlord changed the title TabletServer: Handle nil targets properly TabletServer: Handle nil targets properly everywhere Dec 8, 2023
@mattlord mattlord removed the NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work label Dec 8, 2023
@mattlord mattlord marked this pull request as ready for review December 8, 2023 17:07
deepthi
deepthi previously approved these changes Dec 8, 2023
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you amend the description with a link to all places where we call tabletenv.LocalContext()? All of those are potentially affected.

@deepthi deepthi dismissed their stale review December 8, 2023 18:14

May need more changes

@@ -625,7 +631,10 @@ func (tsv *TabletServer) Rollback(ctx context.Context, target *querypb.Target, t
target, nil, true, /* allowOnShutdown */
func(ctx context.Context, logStats *tabletenv.LogStats) error {
defer tsv.stats.QueryTimings.Record("ROLLBACK", time.Now())
defer tsv.stats.QueryTimingsByTabletType.Record(target.TabletType.String(), time.Now())
// With a tabletenv.LocalContext() the target can be nil.
if target != nil {
Copy link
Member

@deepthi deepthi Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what LocalContext means is that this tablet is talking to itself versus another tablet.
In which case, if we care about these stats, we can use tsv.sm.Target() to get what the current target is, and get the tablet type from that.

@mattlord mattlord requested a review from deepthi December 8, 2023 22:46
Signed-off-by: Matt Lord <[email protected]>
@@ -846,7 +880,7 @@ func (tsv *TabletServer) execute(ctx context.Context, target *querypb.Target, sq
ctx: ctx,
logStats: logStats,
tsv: tsv,
tabletType: target.GetTabletType(),
Copy link
Contributor Author

@mattlord mattlord Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returned UNKNOWN if the method receiver (target) is nil.

if tsv.sm.Target() == nil {
return topodatapb.ShardReplicationError_UNKNOWN.String(), vterrors.Errorf(vtrpcpb.Code_FAILED_PRECONDITION, "TabletServer has no current target")
}
return tsv.sm.Target().String(), nil
Copy link
Member

@deepthi deepthi Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly:
https://github.com/vitessio/vitess/blob/main/go/vt/vttablet/tabletserver/tabletserver.go#L168-L172
Should we add a nil check on tsv.sm.Target() in line 172 while we are at it?

Signed-off-by: Matt Lord <[email protected]>
@mattlord mattlord merged commit a77344b into vitessio:main Dec 9, 2023
116 checks passed
@mattlord mattlord deleted the tsv_nil_target branch December 9, 2023 13:42
vitess-bot pushed a commit that referenced this pull request Dec 9, 2023
mattlord pushed a commit that referenced this pull request Dec 9, 2023
…14734) (#14741)

Signed-off-by: Matt Lord <[email protected]>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
ejortegau pushed a commit to slackhq/vitess that referenced this pull request Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

vttablet: immediate and repeated vttablet panic in message manager on startup
3 participants