Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental backup: do not error on empty backup #15022

Merged

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Jan 24, 2024

Description

An incremental backup backs up the delta of binary logs from last known good backup up to the point where the incremental backup is requested. An incremental backup flushes the binary logs and sees what binary log content needs to be backed-up.

It is possible for an incremental backup to be empty. This can happen when the database is sitting idly doing absolutely nothing from the last good backup to the point of incremental backup.

Up till now, this resulted with an error: vterrors.Errorf(vtrpc.Code_FAILED_PRECONDITION, "no binary logs to backup (increment is empty)").

But this is an undesired behavior. It is possible for databases to sit idly. If someone were to set up a cronjob running an incremental backup at, say, every 1min or every 5min, then they'd have to deal with a lot of errors that really are not indicative of anything wrong per se in the database or in the backup process.

The decision to error on an empty backup was never formalized and was left as an arbitrary behavior. We now fomalize the behavior for an empty backup:

  • An empty backup is possible for incremental backups only. It is irrelevant to full backups.
  • An empty backup happens when there's no new GTIDs since last good backup, ie no content in the binary logs that needs to be copied.
  • An empty backup exits with return code 0, ie not an error.
  • An empty backup clearly indicates "backup is empty" in its standard output (wording might change)
  • An empty backup leaves no files. There is no MANIFEST or otherwise any files to such backup.
  • An empty backup is expected to leave no trace, if possible. There will be a backup path (or the respective comparable on non-file system targets such as s3) created at the beginning of the process, but it is then removed once the backup is found to be empty. Such cleanup could fail, leaving an empty path with no files.
  • The backup only has a name, used internally, during its runtime. But once found to be empty, that name is discarded.

To that effect, this PR:

  • Formalizes a BackupResult enum (backup is unusable, is empty, or is usable)
  • Distinguishes between the two scenarios: no-binlogs-need-any-backup and cannot-find-binlogs-required-for-backup. The former is now an indication of an empty backup and the process exits without error.
  • Updated the PITR backup framework to test empty backup results, does not "wait-for" new backup manifest after empty backup, etc.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Backup and Restore labels Jan 24, 2024
Copy link
Contributor

vitess-bot bot commented Jan 24, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jan 24, 2024
@shlomi-noach shlomi-noach requested a review from a team January 24, 2024 06:15
@github-actions github-actions bot added this to the v19.0.0 milestone Jan 24, 2024
Copy link

codecov bot commented Jan 24, 2024

Codecov Report

Attention: 61 lines in your changes are missing coverage. Please review.

Comparison is base (b977e51) 47.64% compared to head (6744449) 47.68%.
Report is 9 commits behind head on main.

Files Patch % Lines
go/vt/mysqlctl/builtinbackupengine.go 16.66% 40 Missing ⚠️
go/vt/mysqlctl/xtrabackupengine.go 0.00% 14 Missing ⚠️
go/vt/mysqlctl/backup.go 30.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15022      +/-   ##
==========================================
+ Coverage   47.64%   47.68%   +0.04%     
==========================================
  Files        1151     1155       +4     
  Lines      239843   240169     +326     
==========================================
+ Hits       114268   114522     +254     
- Misses     116975   117045      +70     
- Partials     8600     8602       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shlomi-noach shlomi-noach removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jan 24, 2024
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I only had minor comments and suggestions so will let you address those as you prefer.

logger.Errorf2(err, "backup is not usable, aborting it")
finishErr = bh.AbortBackup(ctx)
case BackupEmpty:
logger.Infof("backup is empty")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like no new data to backup, skipping it is more informative, IMO.

// identical to the Previous-GTIDs of the last binary log. But, we also know that we ourselves
// have flushed the binary logs so as to generate the new (now last) binary log.
// Which means, from the Pos of the backup till the time we issued FLUSH BINARY LOGS, there
// were no new GTID entries. The database was completely silent during that period.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly more explanative to note in this and similar places that the database performed no writes vs "did nothing" / "was silent" etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded.

// We have nothing to backup. The backup is empty.
return nil, "", "", nil
}
return nil, "", "", vterrors.Errorf(vtrpc.Code_FAILED_PRECONDITION, "no binary logs to backup. backupFromGTIDSet=%v, prevGTIDsUnion=%v", backupFromGTIDSet.String(), prevGTIDsUnion.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more accurate to say that there are no new GTIDs to backup as flush logs could have been done N times (along with mysqld restarts etc). Right? Or we could say:

return nil, "", "", vterrors.Errorf(vtrpc.Code_FAILED_PRECONDITION, "no binary logs containing new GTIDs to backup. backupFromGTIDSet=%v, prevGTIDsUnion=%v", backupFromGTIDSet.String(), prevGTIDsUnion.String())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded as "cannot find binary logs that cover requested GTID range"

@@ -204,7 +204,7 @@ func (fe *FileEntry) open(cnf *Mycnf, readOnly bool) (*os.File, error) {

// ExecuteBackup runs a backup based on given params. This could be a full or incremental backup.
// The function returns a boolean that indicates if the backup is usable, and an overall error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded as "The function returns a BackupResult that indicates the usability of the backup"

@@ -233,24 +233,25 @@ func getIncrementalFromPosGTIDSet(incrementalFromPos string) (replication.Mysql5
// executeIncrementalBackup runs an incremental backup, based on given 'incremental_from_pos', which can be:
// - A valid position
// - "auto", indicating the incremental backup should begin with last successful backup end position.
func (be *BuiltinBackupEngine) executeIncrementalBackup(ctx context.Context, params BackupParams, bh backupstorage.BackupHandle) (bool, error) {
// The function returns a boolean that indicates if the backup is usable, and an overall error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is also off now, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded

}
return true, nil
return BackupUsable, nil
}

// executeFullBackup returns a boolean that indicates if the backup is usable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment off note again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded

@@ -168,7 +168,7 @@ func closeFile(wc io.WriteCloser, fileName string, logger logutil.Logger, finalE

// ExecuteBackup runs a backup based on given params. This could be a full or incremental backup.
// The function returns a boolean that indicates if the backup is usable, and an overall error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another comment :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded

@@ -177,17 +177,17 @@ func (be *XtrabackupEngine) ExecuteBackup(ctx context.Context, params BackupPara

// executeFullBackup returns a boolean that indicates if the backup is usable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You guess it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

Docs PR: vitessio/website#1668

@shlomi-noach shlomi-noach removed the NeedsWebsiteDocsUpdate What it says label Jan 25, 2024
Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach shlomi-noach requested review from a team and removed request for a team January 29, 2024 05:35
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Copy link
Collaborator

@vmg vmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@shlomi-noach shlomi-noach merged commit 9e27038 into vitessio:main Jan 30, 2024
101 of 102 checks passed
@shlomi-noach shlomi-noach deleted the empty-incremental-backup-no-error branch January 30, 2024 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Backup and Restore Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants