Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select backup engine in Backup() and ignore engines in RestoreFromBackup() #16428

Merged
merged 20 commits into from
Sep 26, 2024

Conversation

rvrangel
Copy link
Contributor

@rvrangel rvrangel commented Jul 18, 2024

Description

This PR will allow calls to the vttablet and vtctld Backup() API to specify which backup engine to be used. You might want to specify flags for both and have a default one defined in --backup_engine_implementation, but then you do not need to restart the vttablet to be able to switch from one engine to the other.

Calls to Backup() without the backup engine will just use the default backup engine the vttablet was started with.

It also implements a filter on RestoreFromBackup() as to allow certain backup engine to be ignored (e.g. you take backups with both the xtrabackupengine and the mysqlshell engine proposed in #16294) so as to have both physical and logical backups, but you only want to be able to restore from the physical backups unless you specify a particular backup.

Related Issue(s)

Fixes #16429

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

Copy link
Contributor

vitess-bot bot commented Jul 18, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jul 18, 2024
@github-actions github-actions bot added this to the v21.0.0 milestone Jul 18, 2024
@rvrangel rvrangel changed the title Select backup engine select backup engine in Backup() and ignore engines in RestoreFromBackup() Jul 18, 2024
@rvrangel rvrangel changed the title select backup engine in Backup() and ignore engines in RestoreFromBackup() select backup engine in Backup() and ignore engines in RestoreFromBackup() Jul 18, 2024
Copy link

codecov bot commented Aug 15, 2024

Codecov Report

Attention: Patch coverage is 35.71429% with 27 lines in your changes missing coverage. Please review.

Project coverage is 69.45%. Comparing base (3743f09) to head (a6dff4b).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
go/cmd/vtctldclient/command/backups.go 18.18% 9 Missing ⚠️
go/vt/vttablet/tabletmanager/restore.go 0.00% 6 Missing ⚠️
go/vt/vttablet/tabletmanager/rpc_backup.go 0.00% 5 Missing ⚠️
go/vt/mysqlctl/backupengine.go 50.00% 3 Missing ⚠️
go/vt/mysqlctl/builtinbackupengine.go 0.00% 1 Missing ⚠️
go/vt/mysqlctl/mysqlshellbackupengine.go 0.00% 1 Missing ⚠️
go/vt/mysqlctl/xtrabackupengine.go 0.00% 1 Missing ⚠️
go/vt/vttablet/tabletmanager/tm_init.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16428      +/-   ##
==========================================
+ Coverage   69.44%   69.45%   +0.01%     
==========================================
  Files        1571     1571              
  Lines      203004   203021      +17     
==========================================
+ Hits       140974   141013      +39     
+ Misses      62030    62008      -22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

go/cmd/vtctldclient/command/backups.go Outdated Show resolved Hide resolved
go/vt/vtctl/backup.go Outdated Show resolved Hide resolved
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this, I've left a few comments throughout the code.
Some major points:

  • cluster_endtoend_vtctlbackup_sharded_clustertest_heavy.yml may not be modified into using xtrabackup because this means we're losing trust in Vitess' functionality on vanilla MySQL.
  • Also, this is (as the name suggests) on of the heaviest endtoend tests we have; consider creating a new and shorter test.
  • Like @deepthi , I prefer allow lists over deny lists. If there's a strong argument for deny lists, please let us know. But IMO if next week someone introduces a new backup method which does not match your requirements, you have to rush to edit your deployment/scripts/tools to ensure it's not being used. Whereas if next week someone introduces a new backup method which does comply with your requirements, then you can at leisure add it.
  • Let's use meaningful data in tests, and more explicit tests over such data.

@@ -299,6 +312,7 @@ func init() {
Root.AddCommand(RemoveBackup)

RestoreFromBackup.Flags().StringVarP(&restoreFromBackupOptions.BackupTimestamp, "backup-timestamp", "t", "", "Use the backup taken at, or closest before, this timestamp. Omit to use the latest backup. Timestamp format is \"YYYY-mm-DD.HHMMSS\".")
RestoreFromBackup.Flags().StringVar(&restoreFromBackupOptions.IgnoredBackupEngines, "ignored-backup-engines", "", "Ignore backups created with this list of backup engines, sepparated by a comma")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a StringSliceVar flag type that handles comma separated lists.

go/cmd/vtctldclient/command/backups.go Outdated Show resolved Hide resolved
lastBackup := getLastBackup(t)

// open the Manifest and retrieve the backup engine that was used
f, err := os.Open(path.Join(localCluster.CurrentVTDATAROOT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an existing readManifestFile() function in backup_utils.go. Please reuse.

return manifest.BackupMethod
}

func getLastBackup(t *testing.T) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider reusing (or refactoring) func (cluster LocalProcessCluster) ListBackups(shardKsName string) ([]string, error)

// lets take two backups, each using a different backup engine
err = localCluster.VtctldClientProcess.ExecuteCommand("Backup", "--allow-primary", "--backup-engine=builtin", primary.Alias)
require.NoError(t, err)
// firstBackup := getLastBackup(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the above line?

// check the new replica has the data
cluster.VerifyRowsInTablet(t, replica2, keyspaceName, 2)

// now lets break the last backup in the shard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are different tests here. Please separate them via properly named t.Run(..., ) subtests.

// make sure we are replicating after the restore is done
err = replica2.VttabletProcess.WaitForTabletStatusesForTimeout([]string{"SERVING"}, timeout)
require.NoError(t, err)
cluster.VerifyRowsInTablet(t, replica2, keyspaceName, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster.VerifyRowsInTablet is a valid way to test the restore content, and strong enough as the test goes, but not clear enough for future us. It's better to explicitly require "there's a row with this specific string value that means something".

@@ -213,8 +219,14 @@ func registerBackupEngineFlags(fs *pflag.FlagSet) {
// a particular backup by calling GetRestoreEngine().
//
// This must only be called after flags have been parsed.
func GetBackupEngine() (BackupEngine, error) {
name := backupEngineImplementation
func GetBackupEngine(backupEngine *string) (BackupEngine, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's never pass a *string. This function should accept a string. If it's empty, use the default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason it is a *string is because that's how grpc optional arguments are created (as it should avoid backward compatibility issues). just to confirm, you prefer me to get rid of it completely and make it mandatory or just avoid passing a *string to GetBackupEngine()? I will keep the grpc optional parameter for now, but let me know if you prefer the other way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just avoid passing a *string to GetBackupEngine()

This.

Comment on lines 223 to 229
var name string
if backupEngine == nil || *backupEngine == "" {
name = backupEngineImplementation
} else {
name = *backupEngine
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this more canonical (and also more idiomatic):

Suggested change
var name string
if backupEngine == nil || *backupEngine == "" {
name = backupEngineImplementation
} else {
name = *backupEngine
}
name := backupEngineImplementation
if backupEngine != "" {
name = backupEngine
}


statsRestoreBackupTime *stats.String
statsRestoreBackupPosition *stats.String
)

func registerRestoreFlags(fs *pflag.FlagSet) {
fs.BoolVar(&restoreFromBackup, "restore_from_backup", restoreFromBackup, "(init restore parameter) will check BackupStorage for a recent backup at startup and start there")
fs.StringVar(&restoreFromBackupIgnoreEngines, "restore_from_backup_ignore_engines", restoreFromBackupIgnoreEngines, "ignore backups taken with the backup engines provided in this comma-separated list")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use StringSliceVar

@deepthi deepthi removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Aug 28, 2024
@rvrangel
Copy link
Contributor Author

@shlomi-noach @deepthi thanks again for the review! I believe I have addressed all your comments, but let me know if I missed anything.

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now except for the flag name and help text. I can approve once those are fixed.

@@ -300,6 +300,7 @@ Flags:
--restore-to-timestamp string (init incremental restore parameter) if set, run a point in time recovery that restores up to the given timestamp, if possible. Given timestamp in RFC3339 format. Example: '2006-01-02T15:04:05Z07:00'
--restore_concurrency int (init restore parameter) how many concurrent files to restore at once (default 4)
--restore_from_backup (init restore parameter) will check BackupStorage for a recent backup at startup and start there
--restore_from_backup_allowed_engines strings if present will filter out any backups taken with engines not included in the list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New flags should use dashes, not underscores. Sorry, we haven't got around to deprecating and replacing the old-style flags, so we have this pitfall where you follow what looks like the standard pattern, but it isn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, the vitess-bot checklist is helpful here. It tells you what is expected under New flags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, thanks @deepthi! I will update it

@@ -300,6 +300,7 @@ Flags:
--restore-to-timestamp string (init incremental restore parameter) if set, run a point in time recovery that restores up to the given timestamp, if possible. Given timestamp in RFC3339 format. Example: '2006-01-02T15:04:05Z07:00'
--restore_concurrency int (init restore parameter) how many concurrent files to restore at once (default 4)
--restore_from_backup (init restore parameter) will check BackupStorage for a recent backup at startup and start there
--restore_from_backup_allowed_engines strings if present will filter out any backups taken with engines not included in the list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--restore_from_backup_allowed_engines strings if present will filter out any backups taken with engines not included in the list
--restore-from-backup-allowed-engines strings (init restore parameter) if set, only backups taken with the specified engines are eligible to be restored

@@ -299,6 +309,7 @@ func init() {
Root.AddCommand(RemoveBackup)

RestoreFromBackup.Flags().StringVarP(&restoreFromBackupOptions.BackupTimestamp, "backup-timestamp", "t", "", "Use the backup taken at, or closest before, this timestamp. Omit to use the latest backup. Timestamp format is \"YYYY-mm-DD.HHMMSS\".")
RestoreFromBackup.Flags().StringSliceVar(&restoreFromBackupOptions.AllowedBackupEngines, "allowed-backup-engines", restoreFromBackupOptions.AllowedBackupEngines, "if present will filter out any backups taken with engines not included in the list")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RestoreFromBackup.Flags().StringSliceVar(&restoreFromBackupOptions.AllowedBackupEngines, "allowed-backup-engines", restoreFromBackupOptions.AllowedBackupEngines, "if present will filter out any backups taken with engines not included in the list")
RestoreFromBackup.Flags().StringSliceVar(&restoreFromBackupOptions.AllowedBackupEngines, "allowed-backup-engines", restoreFromBackupOptions.AllowedBackupEngines, "if set, only backups taken with the specified engines are eligible to be restored")

@deepthi deepthi added Type: Feature and removed NeedsWebsiteDocsUpdate What it says labels Sep 25, 2024
@deepthi
Copy link
Member

deepthi commented Sep 25, 2024

We'll update all the command reference docs before release.

err = localCluster.VtctldClientProcess.ExecuteCommand("Backup", "--allow-primary", "--backup-engine=builtin", primary.Alias)
require.NoError(t, err)
engineUsed := getBackupEngineOfLastBackup(t)
require.Equal(t, "builtin", engineUsed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

err = localCluster.VtctldClientProcess.ExecuteCommand("Backup", "--allow-primary", "--backup-engine=xtrabackup", primary.Alias)
require.NoError(t, err)
engineUsed := getBackupEngineOfLastBackup(t)
require.Equal(t, "xtrabackup", engineUsed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if len(params.AllowedBackupEngines) > 0 && !slices.Contains(params.AllowedBackupEngines, bm.BackupMethod) {
params.Logger.Warningf("Ignoring backup %v because it is using %q backup engine", bh.Name(), bm.BackupMethod)
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach shlomi-noach merged commit 2e2b223 into vitessio:main Sep 26, 2024
101 checks passed
@rvrangel rvrangel mentioned this pull request Oct 25, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Add more flexibility backing up/restoring from different backup engines.
3 participants