-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to automate and schedule backups #553
Conversation
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
VitessBackupSchedule
add the ability to automate backups
VitessBackupSchedule
add the ability to automate backupsSigned-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
In commit bc74ab4, I have applied one of the most important suggestion discussed above which is to remove the # BackupKeyspace
strategies:
- name: BackupKeyspace
cluster: "example"
keyspace: "customer" # BackupCluster
strategies:
- name: BackupCluster
cluster: "example" Meanwhile, the # BackupKeyspace
Args:
/bin/sh
-c
/vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/-80 && /vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/80- # BackupCluster
Args:
/bin/sh
-c
/vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard commerce/- && /vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/-80 && /vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/80- |
|
||
// Cluster defines on which cluster you want to take the backup. | ||
// This field is mandatory regardless of the chosen strategy. | ||
Cluster string `json:"cluster"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure i follow why this is necessary. my mental model is that a user defines []VitessBackupScheduleTemplate
on the ClusterBackupSpec
, and so implicitly each VitessBackupScheduleStrategy
will be associated with the cluster where ClusterBackupSpec
is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point @maxenglander, it is pretty useless. I ended up removing that field from VitessBackupScheduleStrategy
and adding it to VitessBackupScheduleSpec
. The VitessCluster
controller will come and fill that new field when it create a new VitessBackupSchedule
object, that way VitessBackupSchedule
is still be able to select existing components given their cluster names to avoid fetching wrong data in the event where we have multiple VitessCluster
running in our K8S cluster.
See b30aa09 for the change.
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
In e6946fb I have added affinity and annotations in the |
return err | ||
} | ||
if jobStartTime.Add(time.Minute * time.Duration(timeout)).Before(time.Now()) { | ||
if err := r.client.Delete(ctx, job, client.PropagationPolicy(metav1.DeletePropagationBackground)); (err) != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a good thing to have a metric for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return job, nil | ||
} | ||
|
||
func (r *ReconcileVitessBackupsSchedule) createJobPod(ctx context.Context, vbsc *planetscalev2.VitessBackupSchedule, name string) (pod corev1.PodSpec, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth adding a note about that in release notes. i expect it will be a common issue people run in to.
if shardIndex > 0 || ksIndex > 0 { | ||
cmd.WriteString(" && ") | ||
} | ||
createVtctldClientCommand(&cmd, vtctldclientServerArg, strategy.ExtraFlags, ks.name, shard) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am i reading this right that it will be taking a backup of each keyspace and shard in sequence? that doesn't seem ideal to me because if each shard takes an hour to backup, and there are 32 shards, then the backup of the first shard and last shard will be more than a day apart.
i think it would be better if there were at least the option of BackupCluster
and BackupKeyspace
to backup all keyspaces and shards in parallel.
might be better to limit this PR to only support BackupShard
for now, and add support for the other options after more consideration into how to implement BackupKeyspace
and BackupCluster
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do that, remove those two strategies as part of this PR and I will work on a subsequent PR to add them back with a better approach. This PR is getting lengthy already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed via 70ba063
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO BackupAllShardsInKeyspace
and BackupAllShardsInCluster
are better names. It may seem nitty, but I think it's important as it reflects what it actually is: independent backups of the shards. i.e. it is NOT a single consistent backup of the keyspace or cluster at any physical or logical point in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up removing Keyspace and Cluster strategies in this PR as it will require a bigger refactoring. I am keeping that in mind for when we add them though.
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last thought, lgtm overall
pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Florent Poinsard <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this, @frouioui ! ❤️ I only had a few nits/comments that you can address as you feel is best.
take into account when using this feature: | ||
|
||
- If you are using the `xtrabackup` engine, your vttablet pods will need more memory, think about provisioning more memory for it. | ||
- If you are using the `builtin` engine, you will lose a replica during the backup, think about adding a new tablet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a minimum healthy tablet setting? If so, worth mentioning that here IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not
pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go
Outdated
Show resolved
Hide resolved
ks := keyspace{ | ||
name: item.Spec.Name, | ||
} | ||
for shardName := range item.Status.Shards { | ||
ks.shards = append(ks.shards, shardName) | ||
} | ||
if len(ks.shards) > 0 { | ||
result = append(result, ks) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why we don't do this instead:
for shardName := range item.Status.Shards {
ks.shards = append(result, &keyspace{
name: item.Spec.Name,
shards: shardName,
})
}
The other allocations/copying seems unnecessary at first glance. When combined with the single shot precise allocation it should be more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand what you are suggesting. We still want to create one keyspace
object per item
in ksList.Items
and for all the shards in this item
we want to append to keyspace.shards
pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Description
This Pull Request adds a new CRD called
VitessBackupSchedule
. Its main goal is to automate and schedule backups of Vitess, taking backups of the Vitess cluster at regular intervals based on a given cronschedule
andStrategy
. This new CRD is managed by theVitessCluster
, like most other components of the vitess-operator, theVitessCluster
controller is responsible for the whole lifecycle (creation, update, deletion) of theVitessBackupSchedule
object in the cluster. Inside theVitessCluster
it is possible to define severalVitessBackupSchedule
s as a list, allowing for multiple concurrent backup schedules.Among other things, the
VitessBackupSchedule
object is responsible for creating Kubernetes's Job at the desired time, based on the user-definedschedule
. It also keeps track of older jobs and delete them if they are too old, according to user-defined parameters (successfulJobsHistoryLimit
&failedJobsHistoryLimit
). The jobs created by theVitessBackupSchedule
object will use thevtctld
Docker Image and will execute a shell command that is generated based on the user-definedstrategies
. The end user can define as many backup strategy per schedule, each of them mocks whatvtctldclient
is able to do, theBackup
andBackupShard
commands are available, a map of extra flags enable the user to give as many flag as they want tovtctldclient
.A new end-to-end test is added to our BuildKite pipeline as part of this Pull Request to test the proper behavior of this new CRD.
Related PRs
operator.yaml
and add schedule backup example vitessio/vitess#15969Demonstration
For this demonstration I have setup a Vitess cluster by following the steps in the getting started guide, until the very last step where we must apply the
306_down_shard_0.yaml
file. My cluster is then composed of 2 keyspaces:customer
with 2 shards, andcommerce
unsharded. I then modify the306...
yaml file to contain the new backup schedule, as seen in the snippet right below. We want to create two schedules, one for each keyspace. The keyspacecustomer
will have two backup strategies: one for each shard.Once the cluster is stable, all tablets are serving and ready, I re-apply my yaml file with the backup configuration:
Immidiately I can check that the new
VitessBackupSchedule
objects have been created.Now I want to check the pods where the jobs created by
VitessBackupSchedule
are running. After about 2 minutes, we can see four pods, two for each schedule. The pods are marked asCompleted
as they finished their job.Now let's check our backup: