Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VReplication: Validate min set of user permissions in traffic switch prechecks #16762

Merged
merged 22 commits into from
Sep 24, 2024

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Sep 11, 2024

Description

This PR adds another check to the VReplication workflow traffic switching pre-checks that we do in order to avoid any potential downtime or client visible errors from attempting the traffic switch — heading off any user/customer impacting issues. This new check ensures that if we want to setup reverse replication — which is the default in order to support switching traffic back and forth with the SwitchTraffic and ReverseTraffic sub-commands — we ensure that the _vt.vreplication table has the MySQL level privileges needed (select,insert,update,delete) in order to manage the reverse workflow.

This is particularly helpful when working with "external" or "unmanaged" tablets that you would often use when first migrating to Vitess. In this case you have to manually setup the necessary privileges for the --db_filtered_user and other MySQL users that Vitess uses in the mysqld instances that these unmanaged tablets work with. This check ensures that we have the minimum privileges we need on those unmanaged tablets' mysqld instances for when we execute the SwitchTraffic sub-command.

The test case in the issue then fails as expected when attempting to switch traffic:

❯ vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --tablet-types primary
E0916 13:26:38.360193   34580 main.go:56] rpc error: code = Unknown desc = primary tablets are not able to fully manage the reverse vreplication workflow in the commerce keyspace: user vt_filtered does not have the required set of permissions (select,insert,update,delete) on the _vt.vreplication table on tablet zone1-0000000101

Manual test

alias vtctldclient='command vtctldclient --server=localhost:15999'

./101_initial_cluster.sh; mysql < ../common/insert_commerce_data.sql; ./201_customer_tablets.sh; ./202_move_tables.sh

# Test with default global privileges
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --dry-run


command mysql -u root --socket ${VTDATAROOT}/vt_0000000$(vtctldclient GetTablets --keyspace commerce --tablet-type primary --shard "0" | awk '{print $1}' | cut -d- -f2 | bc)/mysql.sock vt_commerce -e "revoke select,insert,update,delete on *.* from vt_filtered@localhost"

# Test without global privileges
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --dry-run


command mysql -u root --socket ${VTDATAROOT}/vt_0000000$(vtctldclient GetTablets --keyspace commerce --tablet-type primary --shard "0" | awk '{print $1}' | cut -d- -f2 | bc)/mysql.sock vt_commerce -e "grant select,insert,update,delete on _vt.* to vt_filtered@localhost"

# Test with db level privileges
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --dry-run


command mysql -u root --socket ${VTDATAROOT}/vt_0000000$(vtctldclient GetTablets --keyspace commerce --tablet-type primary --shard "0" | awk '{print $1}' | cut -d- -f2 | bc)/mysql.sock vt_commerce -e "revoke select,insert,update,delete on _vt.* from vt_filtered@localhost"

# Test without global or db level privileges
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --dry-run


command mysql -u root --socket ${VTDATAROOT}/vt_0000000$(vtctldclient GetTablets --keyspace commerce --tablet-type primary --shard "0" | awk '{print $1}' | cut -d- -f2 | bc)/mysql.sock vt_commerce -e "grant select,insert,update,delete on _vt.vreplication to vt_filtered@localhost"

# Test with table level privileges
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --dry-run


command mysql -u root --socket ${VTDATAROOT}/vt_0000000$(vtctldclient GetTablets --keyspace commerce --tablet-type primary --shard "0" | awk '{print $1}' | cut -d- -f2 | bc)/mysql.sock vt_commerce -e "revoke select,insert,update,delete on _vt.vreplication from vt_filtered@localhost"

# Test without global, db, or table level privileges
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --dry-run

Results:

# Test without global privileges
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --dry-run
E0916 13:31:09.115459   40218 main.go:56] rpc error: code = Unknown desc = primary tablets are not able to fully manage the reverse vreplication workflow in the commerce keyspace: user vt_filtered does not have the required set of permissions (select,insert,update,delete) on the _vt.vreplication table on tablet zone1-0000000101


# Test with db level privileges
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --dry-run
SwitchTraffic dry run results for workflow customer.commerce2customer at 16 Sep 24 17:34 UTC

Lock keyspace commerce
Mirroring 0.00 percent of traffic from keyspace commerce to keyspace customer for tablet types [REPLICA,RDONLY]
Switch reads for tables [corder,customer] to keyspace customer for tablet types [REPLICA,RDONLY]
Routing rules for tables [corder,customer] will be updated
Unlock keyspace commerce
Lock keyspace commerce
Lock keyspace customer
Mirroring 0.00 percent of traffic from keyspace commerce to keyspace customer for tablet types [PRIMARY]
Stop writes on keyspace commerce for tables [corder,customer]: [keyspace:commerce;shard:0;position:MySQL56/b82d90f0-7451-11ef-929d-065dbea5fc04:1-47]
Wait for vreplication on stopped streams to catchup for up to 30s
Create reverse vreplication workflow commerce2customer_reverse
Create journal entries on source databases
Enable writes on keyspace customer for tables [corder,customer]
Switch routing from keyspace commerce to keyspace customer
Routing rules for tables [corder,customer] will be updated
Switch writes completed, freeze and delete vreplication streams on: [tablet:200]
Start reverse vreplication streams on: [tablet:101]
Mark vreplication streams frozen on: [keyspace:customer;shard:0;tablet:200;workflow:commerce2customer;dbname:vt_customer]
Unlock keyspace customer
Unlock keyspace commerce


# Test without global or db level privileges
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --dry-run
E0916 13:35:23.642355   45669 main.go:56] rpc error: code = Unknown desc = primary tablets are not able to fully manage the reverse vreplication workflow in the commerce keyspace: user vt_filtered does not have the required set of permissions (select,insert,update,delete) on the _vt.vreplication table on tablet zone1-0000000101


# Test with table level privileges
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --dry-run
SwitchTraffic dry run results for workflow customer.commerce2customer at 16 Sep 24 17:35 UTC

Lock keyspace commerce
Mirroring 0.00 percent of traffic from keyspace commerce to keyspace customer for tablet types [REPLICA,RDONLY]
Switch reads for tables [corder,customer] to keyspace customer for tablet types [REPLICA,RDONLY]
Routing rules for tables [corder,customer] will be updated
Unlock keyspace commerce
Lock keyspace commerce
Lock keyspace customer
Mirroring 0.00 percent of traffic from keyspace commerce to keyspace customer for tablet types [PRIMARY]
Stop writes on keyspace commerce for tables [corder,customer]: [keyspace:commerce;shard:0;position:MySQL56/b82d90f0-7451-11ef-929d-065dbea5fc04:1-49]
Wait for vreplication on stopped streams to catchup for up to 30s
Create reverse vreplication workflow commerce2customer_reverse
Create journal entries on source databases
Enable writes on keyspace customer for tables [corder,customer]
Switch routing from keyspace commerce to keyspace customer
Routing rules for tables [corder,customer] will be updated
Switch writes completed, freeze and delete vreplication streams on: [tablet:200]
Start reverse vreplication streams on: [tablet:101]
Mark vreplication streams frozen on: [keyspace:customer;shard:0;tablet:200;workflow:commerce2customer;dbname:vt_customer]
Unlock keyspace customer
Unlock keyspace commerce


# Test without global, db, or table level privileges
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --dry-run
E0916 13:36:27.420890   46103 main.go:56] rpc error: code = Unknown desc = primary tablets are not able to fully manage the reverse vreplication workflow in the commerce keyspace: user vt_filtered does not have the required set of permissions (select,insert,update,delete) on the _vt.vreplication table on tablet zone1-0000000101

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Copy link
Contributor

vitess-bot bot commented Sep 11, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Sep 11, 2024
@github-actions github-actions bot added this to the v21.0.0 milestone Sep 11, 2024
@mattlord mattlord removed the NeedsBackportReason If backport labels have been applied to a PR, a justification is required label Sep 11, 2024
Copy link

codecov bot commented Sep 11, 2024

Codecov Report

Attention: Patch coverage is 10.97561% with 73 lines in your changes missing coverage. Please review.

Project coverage is 69.49%. Comparing base (95f2e3e) to head (648157c).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
go/vt/vttablet/tabletmanager/rpc_vreplication.go 0.00% 26 Missing ⚠️
go/vt/vtctl/workflow/server.go 27.27% 24 Missing ⚠️
go/vt/vttablet/grpctmclient/client.go 0.00% 9 Missing ⚠️
go/vt/vttablet/grpctmserver/server.go 0.00% 6 Missing ⚠️
go/vt/vtcombo/tablet_map.go 0.00% 2 Missing ⚠️
go/vt/vtctl/workflow/utils.go 0.00% 2 Missing ⚠️
go/vt/vttablet/faketmclient/fake_client.go 0.00% 2 Missing ⚠️
go/vt/vttablet/tmrpctest/test_tm_rpc.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16762      +/-   ##
==========================================
- Coverage   69.51%   69.49%   -0.02%     
==========================================
  Files        1569     1569              
  Lines      202517   202633     +116     
==========================================
+ Hits       140780   140827      +47     
- Misses      61737    61806      +69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mattlord mattlord changed the title VReplication: Validate full expected set of user permissions on traffic switch dry runs VReplication: Validate min set of user permissions in traffic switch prechecks Sep 11, 2024
@mattlord mattlord added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication and removed NeedsWebsiteDocsUpdate What it says NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work labels Sep 11, 2024
Signed-off-by: Matt Lord <[email protected]>
@mattlord mattlord removed the NeedsIssue A linked issue is missing for this Pull Request label Sep 16, 2024
@mattlord mattlord marked this pull request as ready for review September 16, 2024 17:38
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adding a new vttablet RPC and calling it from vtctld. This means that you cannot run SwitchTraffic on a cluster that is in a partially upgraded state, or at least you need to upgrade vtctld before vttablet. What is our current stance on being able to run SwitchTraffic mid-upgrade?

@mattlord
Copy link
Contributor Author

We are adding a new vttablet RPC and calling it from vtctld. This means that you cannot run SwitchTraffic on a cluster that is in a partially upgraded state, or at least you need to upgrade vtctld before vttablet. What is our current stance on being able to run SwitchTraffic mid-upgrade?

Good question! I'll have to test that and think on it...

@mattlord
Copy link
Contributor Author

We are adding a new vttablet RPC and calling it from vtctld. This means that you cannot run SwitchTraffic on a cluster that is in a partially upgraded state, or at least you need to upgrade vtctld before vttablet. What is our current stance on being able to run SwitchTraffic mid-upgrade?

@deepthi what do you think? 950530e

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes requested, feel free to merge if/when addressed.

go/vt/vttablet/tabletmanager/rpc_vreplication.go Outdated Show resolved Hide resolved
)
)
limit 1
`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct. I wonder if we should not instead parse the result of SHOW GRANTS instead, as those guarantee correct aggregation of the above mysql.* tables.

Copy link
Contributor Author

@mattlord mattlord Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that as well, but I thought that it would be much more complicated since I didn't see any existing parsing support for it. And since that output is formulated from querying these tables anyway I went this direction.

go/vt/vttablet/tabletmanager/rpc_vreplication.go Outdated Show resolved Hide resolved
if err != nil { // Should never happen
return nil, vterrors.Errorf(vtrpcpb.Code_INTERNAL, "unexpected result for query %s: expected boolean-like value, got: %q",
query, qr.Rows[0][0].ToString())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using row := qr.Named().Row().

go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved
@mattlord mattlord merged commit 9e7a63a into vitessio:main Sep 24, 2024
99 checks passed
@mattlord mattlord deleted the vrepl_dry_run_perms branch September 24, 2024 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: VReplication SwitchTraffic allowed when reverse workflow not fully usable
4 participants