-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge DS Defense patches into Slack's v14 release branch #90
Conversation
…#67) * Emit per workload labels for existing per table vttablet metrics (vitessio#12394) * Emit per workload labels for existing per table vttablet metrics This adds the possibility to configure vttablet (via CLI flag) to also have a workload label for existing per table metrics (query counts, query times, query errors, query rows affected, query rows returned, query error counts). Workload can be any string that makes sense for the client application. For example, API endpoint name, controller, batch job name, application name or something else. This is usefult to be able to gain observability about how the query load is distributed across different workloads. This is achieved with two new CLI flags, namely: * `enable-per-workload-table-metrics`: whether to enable or disable per workload metric collection - disabled by default to preserve the current behavior, thus making the new feature opt-in only. * `workload-label`: a string to look for in query comments to identify the workload running the current query. The workload is obtained by parsing query comments of the form: /* ... <workload_label>=<workload_name>; ... */ For example, if vttablet is started with `--enable-per-workload-table-metrics --workload-label app_name` anda query is issued with a comment like /* ... app_name=shop; ... */ then metrics will look like ``` vttablet_query_counts{plan="Select",table="dual", workload="shop"} 15479 ``` instead of ``` vttablet_query_counts{plan="Select",table="dual"} 15479 ``` Query comment parsing only takes place if `--enable-per-workload-table-metrics` is used, as to not incur parsing performance impact if the user does not want per workload metrics. Signed-off-by: Eduardo J. Ortega U <[email protected]> * make linter happy Signed-off-by: Eduardo J. Ortega U <[email protected]> * fix flags e2e test Signed-off-by: Eduardo J. Ortega U <[email protected]> * Address PR comments: * Obtain workload information on the vtgate instead of the vttablet, avoiding double parsing. * Treat workload name as a query directive. * Send workload name from vtgate to vttablet as ExecuteOptions. Additionally, annotate tabletserver's execution span with the workload name to also enrich traces with workload name data, in addition to metrics. Signed-off-by: Eduardo J. Ortega U <[email protected]> * A few fixes: 1. Rebuild some files with `make proto`. 2. Protect against nil ExecuteOptions on the tabletserver. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix flags e2e test Signed-off-by: Eduardo J. Ortega U <[email protected]> * Address PR comments Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fixes Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix a comment Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix e2e flag test Signed-off-by: Eduardo J. Ortega U <[email protected]> * Update JS code for protobuf changes. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix QueryEngine unit test Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix e2e flag test Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix spurious tab in comment Signed-off-by: Eduardo J. Ortega U <[email protected]> * Address PR comment Don't use dual format flag for new flags - stick with - separated ones. Signed-off-by: Eduardo J. Ortega U <[email protected]> --------- Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix cherry-pick * Add basic metrics to `vttablet` transaction throttler (vitessio#12418) * Add basic stats to vttablet tx throttler Signed-off-by: Tim Vaillancourt <[email protected]> * test new metrics Signed-off-by: Tim Vaillancourt <[email protected]> * reorder Signed-off-by: Tim Vaillancourt <[email protected]> * short names Signed-off-by: Tim Vaillancourt <[email protected]> * Add max rate Signed-off-by: Tim Vaillancourt <[email protected]> * Move NewGaugeFunc to under conditional Signed-off-by: Tim Vaillancourt <[email protected]> * Use env Signed-off-by: Tim Vaillancourt <[email protected]> * Remove env from TxThrottler struct Signed-off-by: Tim Vaillancourt <[email protected]> * Fix tests Signed-off-by: Tim Vaillancourt <[email protected]> * PR suggestion Signed-off-by: Tim Vaillancourt <[email protected]> * Fix unit test Signed-off-by: Tim Vaillancourt <[email protected]> * reorder test vars Signed-off-by: Tim Vaillancourt <[email protected]> --------- Signed-off-by: Tim Vaillancourt <[email protected]> * Fix go/vt/vttablet/tabletserver/query_engine_test.go --------- Signed-off-by: Eduardo J. Ortega U <[email protected]> Signed-off-by: Tim Vaillancourt <[email protected]> Co-authored-by: Eduardo J. Ortega U <[email protected]>
* `slack-vitess-r14.0.5-dsdefense`: set `CODEOWNERS` * Use a sub-team
… 2 (#69) * Skip recalculating the rate in MaxReplicationLagModule when it can't be done This defends against lag records with nil stats which can lead to segfaults. See vitessio#12619 Signed-off-by: Eduardo J. Ortega U <[email protected]> * Address PR comments. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Make linter happy Signed-off-by: Eduardo J. Ortega U <[email protected]> * Add support for criticality query directive, and have TxThrottler respect that Signed-off-by: Eduardo J. Ortega U <[email protected]> * Remove unused variable Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix CI pipeline Signed-off-by: Eduardo J. Ortega U <[email protected]> Signed-off-by: Tim Vaillancourt <[email protected]> * Address PR comments. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Make linter happy & add extra test cases. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Address PR comments. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix circular import Signed-off-by: Eduardo J. Ortega U <[email protected]> * Make linter happy Signed-off-by: Eduardo J. Ortega U <[email protected]> * Address PR comments: * Invalid criticality in query directive fails the query. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix executor.go Signed-off-by: Tim Vaillancourt <[email protected]> * Fix go/vt/vtgate/executor.go again Signed-off-by: Tim Vaillancourt <[email protected]> * Fix TestNewMaxReplicationLagModule_recalculateRate * Fix go/vt/vtgate/executor_test.go * Regen protos from linux Signed-off-by: Tim Vaillancourt <[email protected]> --------- Signed-off-by: Eduardo J. Ortega U <[email protected]> Signed-off-by: Tim Vaillancourt <[email protected]> Co-authored-by: Eduardo J. Ortega U <[email protected]>
…ES (vitessio#12949) (#81) This error code seems better suited to represent the fact that transactions are being throttled by the server due to some form of resource contention than the current code 1203 ER_TOO_MANY_USER_CONNECTIONS. Signed-off-by: Eduardo J. Ortega U <[email protected]> Co-authored-by: Eduardo J. Ortega U <[email protected]>
…itessio#12901 (#83) * Cleanup panics in `txthrottler`, reorder for readability (vitessio#12901) * Cleanup tx_throttler.go Signed-off-by: Tim Vaillancourt <[email protected]> * Cleanup tx_throttler.go #2 Signed-off-by: Tim Vaillancourt <[email protected]> * Fix throttlerFactoryFunc Signed-off-by: Tim Vaillancourt <[email protected]> * Undo if-cond consolidation Signed-off-by: Tim Vaillancourt <[email protected]> * Undo struct shuffling Signed-off-by: Tim Vaillancourt <[email protected]> * prove that disabled config returns nil error Signed-off-by: Tim Vaillancourt <[email protected]> * Improve test Signed-off-by: Tim Vaillancourt <[email protected]> --------- Signed-off-by: Tim Vaillancourt <[email protected]> * remove unused cell string --------- Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Eduardo J. Ortega U <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please merge this into a separate branch as per our discussion here: https://slack-pde.slack.com/archives/C04NYAU82SC/p1685965962525959
* Fix transaction throttler ignoring the initial rate This addresses the issue reported in vitessio#12549 Signed-off-by: Eduardo J. Ortega U <[email protected]> * Add missing override of max replication lag in `throttler.newThrottler()` Signed-off-by: Eduardo J. Ortega U <[email protected]> * Reorder functions to make diff easier to read Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix check for maxRate in `newThrottlerFromConfig()` Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix some CI pipeline issues Signed-off-by: Eduardo J. Ortega U <[email protected]> * Address PR comment. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix typo Signed-off-by: Eduardo J. Ortega U <[email protected]> --------- Signed-off-by: Eduardo J. Ortega U <[email protected]> Signed-off-by: Eduardo J. Ortega U. <[email protected]>
…3040) * TxThrottler support for transactions outside BEGIN/COMMIT This change allows the TxThrottler to throttle requests sent outside of explicit transactions (i.e. explicit BEGIN/COMMIT blocks) when configured to do so via a new config flag. Otherwise, it preserves the current/default behavior of only throttling transactions inside BEGIN/COMMIT. In addition, when this flag is passed, and because the call to throttle is done in a context in which the execution plan is already known, this change uses the plan type to make sure that throttling is triggered only when the query being executed is INSERT/UPDATE/DELETE/LOAD, so that SELECTs and others no longer get throttled unnecessarily, as they do not contribute to increasing replication lag, which is what the TxThrottler aims at controlling. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix e2e flag tests & TxThrottler unit test Signed-off-by: Eduardo J. Ortega U <[email protected]> * Throttle auto-commit statements in QueryExecutor instead of TxPool This changes where we call the transaction throttler: 1. Statements in `BEGIN/COMMIT` blocks keep being throttled in `TabletServer.begin()`. 2. Additionally, throttling is added in QueryExecutor.execAutocommit() and `QueryExecutor.execAsTransaction()`. We also change things so that throttling in this new case is not opt-in via configuration flag but instead is the new and only behavior. Finally, we remove some previously added changes to example scripts that had been added with the intention of testing and are not part of the PR. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Adds test cases for QueryExecutor.Execute() with TxThrottle throttling To make unit testing simple here, we separated the interface and implementation of the TxThrottle, and simply used a mock implementation of the interface in the tests. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Add note on new TxThrottler behavior in v17 changelog Signed-off-by: Eduardo J. Ortega U <[email protected]> * Fix PR number in changelog entry for TxThrottler behavior change. Signed-off-by: Eduardo J. Ortega U <[email protected]> * Make linter happy Signed-off-by: Eduardo J. Ortega U <[email protected]> * Address PR comments Signed-off-by: Eduardo J. Ortega U <[email protected]> --------- Signed-off-by: Eduardo J. Ortega U <[email protected]>
This branch has been rebased from the release one; I will keep this one and build from it. Eventually, once you are done with v14 upgrade, I will merge this into release. |
Signed-off-by: Eduardo J. Ortega U. <[email protected]>
* txthrottler: further code cleanup * Fix bad merge resolution --------- Signed-off-by: Tim Vaillancourt <[email protected]>
…let type (vitessio#12174) (#95) * Add flag to select tx throttler tablet type (vitessio#12174) Signed-off-by: Tim Vaillancourt <[email protected]> * Remove mistaken git add Signed-off-by: Tim Vaillancourt <[email protected]> --------- Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Priya Bibra <[email protected]> Signed-off-by: 'Priya Bibra' <[email protected]>
This is a backport of upstreamed vitessio#13526 Signed-off-by: Eduardo J. Ortega U <[email protected]>
* BackportAdd dry-run mode to the TxThrottler This is a backport of upstreamed vitessio#13604 Signed-off-by: Eduardo J. Ortega U <[email protected]>
* Cleanup usage of go.rice in favor of go:embed (vitessio#10956) * Cleanup usage of go.rice in favor of go:embed The usage of go.rice predates the availability of go:embed, but we should switch to using go:embed instead to ship specific assets like config files that we need. go.rice is also incompatible with Go 1.19 and while it might see a fix in the future, it seems better to go with the recommended Go approach that is available these days. Signed-off-by: Dirkjan Bussink <[email protected]> * Move vtctld to also use `go embed` instead of go.rice Signed-off-by: Dirkjan Bussink <[email protected]> * Remove last rice-box related comments Signed-off-by: Dirkjan Bussink <[email protected]> * Remove config moving This right now breaks building the actual tests since the tests might also end up loading the regular code which has a `go embed` and refers to the package with the config embeds. This doesn't mean that the config isn't properly included in the binaries. Also with using `go embed` we have a build time dependency on the files and we always know the latest is included, so we don't have the issue of potentially outdated files either. All in all, it seems simplest to remove this logic and trust that Go itself works as advertised. Signed-off-by: Dirkjan Bussink <[email protected]> * fix vtrootbin Signed-off-by: 'Stanislav Maksimov' <[email protected]> * update the build to 1.19.10 Signed-off-by: 'Stanislav Maksimov' <[email protected]> * update the workflows to 1.19.10 Signed-off-by: 'Stanislav Maksimov' <[email protected]> * update the docker image to 1.19.10 Signed-off-by: 'Stanislav Maksimov' <[email protected]> * update the bootstrap version to get docker to go 1.20.5 Signed-off-by: 'Stanislav Maksimov' <[email protected]> * use 1.18.7 for static checks Signed-off-by: 'Stanislav Maksimov' <[email protected]> * lower the build version requirement to allow static checks to pass Signed-off-by: 'Stanislav Maksimov' <[email protected]> * experiment with using old and new Go for a single workflow Signed-off-by: 'Stanislav Maksimov' <[email protected]> * use old and new Go for a upgrade/downgrade workflows Signed-off-by: 'Stanislav Maksimov' <[email protected]> * set -buildvcs=false Signed-off-by: 'Stanislav Maksimov' <[email protected]> * address build errors Signed-off-by: 'Stanislav Maksimov' <[email protected]> * Revert "fix parameter name (#93)" (#100) This reverts commit 76159fd. * apply patch 12178 to v14 (#102) Signed-off-by: Priya Bibra <[email protected]> Signed-off-by: 'Priya Bibra' <[email protected]> * more workflows to use self-hosted runner * even more workflows to use self-hosted runner * partially switch upgrade-downgrade to self-hosted runner * Ejortegau/larger runners (#115) * Empty commit to trigger CI Signed-off-by: Eduardo J. Ortega U <[email protected]> * `slack-vitess-r14.0.5-dsdefense`: use larger runner Signed-off-by: Tim Vaillancourt <[email protected]> * Use runner group instead Signed-off-by: Tim Vaillancourt <[email protected]> * Rename group Signed-off-by: Tim Vaillancourt <[email protected]> * Move more jobs to runner group Signed-off-by: Tim Vaillancourt <[email protected]> * use vitess-ubuntu20 runner group Signed-off-by: Tim Vaillancourt <[email protected]> * Rever change of runner type for e2e ERS PRS new features heavy test Signed-off-by: Eduardo J. Ortega U <[email protected]> --------- Signed-off-by: Eduardo J. Ortega U <[email protected]> Signed-off-by: Tim Vaillancourt <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> * Revert change of runner type for shardedrecovery_stress_verticalsplit_heavy. (#116) We do this because the test is taking a lot longer & failing in some cases with the larger runner. Signed-off-by: Eduardo J. Ortega U <[email protected]> * `slack-vitess-r14.0.5`: use dedicated larger runner (#113) * `slack-vitess-r14.0.5`: use dedicated larger runner Signed-off-by: Tim Vaillancourt <[email protected]> * fix fileNameFromPosition Signed-off-by: Tim Vaillancourt <[email protected]> * empty commit Signed-off-by: Tim Vaillancourt <[email protected]> * empty commit Signed-off-by: Tim Vaillancourt <[email protected]> --------- Signed-off-by: Tim Vaillancourt <[email protected]> * `slack-vitess-r14.0.5`: allow conn overrides in consul topo (#111) * `slack-vitess-r14.0.5`: allow conn overrides in consul topo Signed-off-by: Tim Vaillancourt <[email protected]> * fix e2e test Signed-off-by: Tim Vaillancourt <[email protected]> --------- Signed-off-by: Tim Vaillancourt <[email protected]> --------- Signed-off-by: Dirkjan Bussink <[email protected]> Signed-off-by: 'Stanislav Maksimov' <[email protected]> Signed-off-by: Priya Bibra <[email protected]> Signed-off-by: 'Priya Bibra' <[email protected]> Signed-off-by: Eduardo J. Ortega U <[email protected]> Signed-off-by: Tim Vaillancourt <[email protected]> Co-authored-by: Dirkjan Bussink <[email protected]> Co-authored-by: Roderick Yao <[email protected]> Co-authored-by: pbibra <[email protected]> Co-authored-by: Eduardo J. Ortega U <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]>
…test + compressors (#119) (#124) * Delete all legacy sharding related code (vitessio#10278) * Delete all legacy sharding related code * Move used until functions from initialsharding to cluster * Remove vtctl commands * Kill vtworker and SetKeyspaceServedFrom cmd * WaitForDrain related stragglers * Legacy local straggler workflow * Get rid of SetKeyspaceShardingInfo & wait for drain stragglers * Remove vtworker stragglers * Update throttlerservice protobuf * Rename test 24, add hashicorp vault test to it (now mysql_server_vault) * Remove last mentiones of legacy sharding in vtctl * remove binlog_use_v3_resharding_mode * Address review comments * Address review comments * Correct vtgate help output * go fmt * Remove v2 resharding fields (vitessio#10409) * cleanup: remove sharding_column_name and sharding_column_type * cleanup: remove sharding_column_name and sharding_column_type * cleanup: remove sharding_column_name and sharding_column_type * cleanup: remove sharding_column_name and sharding_column_type * generate vtadmin files * cleanup: remove sharding_column_name and sharding_column_type from vtadmin * Merge from main * Fix bad merge conflict resolution * Fix missing 'sharding' import * Fix bad conflict resolution in go/test/endtoend/cellalias/cell_alias_test.go * Backup/Restore: add support for external compressors and decompressors (vitessio#10558) * change to support an external decompressor * add external compressor support + builtin additional compressors * wrap external compressor/decompressor * go mod tidy + comments * add copyright notices * add support for builtin engine * Adding test case for buckup compression * Fixing unit test and run mod tidy * Removing unwanted unit tests * Increase timeout of backup tests * fixing linter errors * Change test logic to accomodate running selective tests * removing lint warning * fixing test failure * Removing un-necessary test * Fixing code review feeback * Change builtinEngine to consider 'auto' decompressor * fixing Upgrade/Downgrade test * Fix type & add summary under release notes * Fixing typos in summary * Fixing flag name typos * Add MySQL 8 Support to Backup Tests (vitessio#10691) * Add support for MySQL 8.0 in backup tests * Add 8.0 workflow * whitespace * Use vtctldclient SetKeyspaceDurabilityPolicy to manage semi-sync This needed to be done after the shard was setup in order to satisfy the semantic assumptions related to semi-sync in the tests. * Remove extraneous changes * We need lz4 for TestXtrabackupStreamWithlz4Compression * Try using Percona Repo for MySQL 8 to align mysqld and xtrabackup versions * Specify stream type everywhere * Remove repeated server install * Moar... * Move vtctlbackup test to 8.0 * Rename vtbackup test and move to MySQL 8 * Split the xbstream tests so the workflow doesn't time out Otherwise it was going over the 10min limit and getting killed. * Use MySQL 8 compat method for setting passwords * Test increasing timeout at another level * Don't use the init passwords file with 8.0 mysqlctl doesn't start... This is likely due to the change in initialization behavior with MySQL 8.0 as it goes though two phases and you can't simply start up mysqld and pass it data, it has to initialize and restart first. * Fix incorrect password update statement for vt_repl user * Bump timeouts for 8.0 backup tests * Bump it more :( * Increase backup/restore timeout in backup_utils * Apply new 8.0 template everywhere * Fix bugs around how the compression flags were getting passed * Use 45m timeout for the workflow, 30m for the run. * These changes were no longer needed so limiting diff. * Explicitly skip new linter check * Fix test file merge issues * Go 1.18.7 to fix error * Fix lint * Fix vttablet.txt * Remove sharding_column_name and sharding_column_type from vtctld2 (vitessio#10459) * Remove sharding_column_name and sharding_column_type * Run make web_build to compile production files * Remove vars from test * Remove sharding column type select box test from web test --------- Signed-off-by: Matt Lord <[email protected]> Signed-off-by: Tim Vaillancourt <[email protected]> Signed-off-by: Arvind Murty <[email protected]> Signed-off-by: notfelineit <[email protected]> Co-authored-by: Matt Lord <[email protected]> Co-authored-by: Arvind Murty <[email protected]> Co-authored-by: Rameez Sajwani <[email protected]> Co-authored-by: Renan Rangel <[email protected]> Co-authored-by: Renan Rangel <[email protected]> Co-authored-by: Rohit Nayak <[email protected]> Co-authored-by: Frances Thai <[email protected]>
* Switch to `pflag` for all parsing This transparently swaps the cli parsing library used by `internal/flag` from the standard library `flag` package to `spf13/pflag`. It also introduces hook points for packages throughout the vitess codebase to register their flags for either all commands using `servenv` or a particular subset of commands. This allows these packages to continue to define their flag variables in a package-private way, but without polluting the global flagset. * Workaround exit code difference between stdlib `flag` and `pflag` tl;dr stdlib `flag` has [this][1] and `pflag` does not [1]: golang/go@dcf0929 * adjust test data for difference in spacing between pflag/stdflag * update lingering legacy flag tests * Update vtgate/tabletgateway.go to use new interface to isolate flags, update test data * update flags in java test code Signed-off-by: Andrew Mason <[email protected]> Co-authored-by: Andrew Mason <[email protected]>
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
This PR was closed because it has been stale for 7 days with no activity. |
Description
This PR ports DS Defense related patches that have been backported from upstream to the v14 release branch. All of these can be dropped once we reach v17+ as they have been upstreamed.
Related Issue(s)
https://jira.tinyspeck.com/browse/DRE-8607
Checklist
Deployment Notes
N/A