Skip to content

Commit

Permalink
Apply review suggestions
Browse files Browse the repository at this point in the history
Signed-off-by: Florent Poinsard <[email protected]>
  • Loading branch information
frouioui committed May 24, 2024
1 parent bd70ae6 commit e6e0e45
Show file tree
Hide file tree
Showing 13 changed files with 113 additions and 102 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,4 @@ tags
.idea/*
# End of https://www.gitignore.io/api/go,vim,emacs,visualstudiocode

vtdataroot
**/vtdataroot
23 changes: 13 additions & 10 deletions docs/api/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -2693,7 +2693,7 @@ <h3 id="planetscale.com/v2.VitessBackupScheduleTemplate">VitessBackupScheduleTem
</td>
<td>
<em>(Optional)</em>
<p>Suspend enables the suspension of the VitessBackupSchedule, pausing any further scheduled
<p>Suspend pause the associated backup schedule. Pausing any further scheduled
runs until Suspend is set to false again. This is useful if you want to pause backup without
having to remove the entire VitessBackupSchedule object from the cluster.</p>
</td>
Expand All @@ -2707,8 +2707,11 @@ <h3 id="planetscale.com/v2.VitessBackupScheduleTemplate">VitessBackupScheduleTem
</td>
<td>
<em>(Optional)</em>
<p>StartingDeadlineSeconds allows for the VitessBackupSchedule controller to start jobs that late
by the given amount of seconds.</p>
<p>StartingDeadlineSeconds enables the VitessBackupSchedule to start a job even though it is late by
the given amount of seconds. Let&rsquo;s say for some reason the controller process a schedule run on
second after its scheduled time, if StartingDeadlineSeconds is set to 0, the job will be skipped
as it&rsquo;s too late, but on the other hand, if StartingDeadlineSeconds is greater than one second,
the job will be processed as usual.</p>
</td>
</tr>
<tr>
Expand Down Expand Up @@ -2738,12 +2741,12 @@ <h3 id="planetscale.com/v2.VitessBackupScheduleTemplate">VitessBackupScheduleTem
</td>
<td>
<em>(Optional)</em>
<p>AllowedMissedRun defines how many missed run of the schedule will be allowed before giving up on finding the last job.
If the operator&rsquo;s clocked is skewed and we end-up missing a certain number of jobs, finding the last
job might be very consuming, depending on the frequency of the schedule and the duration during which
the operator&rsquo;s clock was misbehaving, also depending on how laggy the clock is, we can end-up with thousands
or missed runs. For this reason, AllowedMissedRun, which is set to 100 by default, will allow us to give up finding
the last job and simply wait for the next job on the schedule.
<p>AllowedMissedRuns defines how many missed run of the schedule will be allowed before giving up on finding the last job.
If the operator&rsquo;s clock is skewed and we end-up missing a certain number of jobs, finding the last
job might be very time-consuming, depending on the frequency of the schedule and the duration during which
the operator&rsquo;s clock was misbehaving. Also depending on how laggy the clock is, we can end-up with thousands
of missed runs. For this reason, AllowedMissedRun, which is set to 100 by default, will short circuit the search
and simply wait for the next job on the schedule.
Unless you are experiencing issue with missed runs due to a misconfiguration of the clock, we recommend leaving
this field to its default value.</p>
</td>
Expand All @@ -2757,7 +2760,7 @@ <h3 id="planetscale.com/v2.VitessBackupScheduleTemplate">VitessBackupScheduleTem
</td>
<td>
<em>(Optional)</em>
<p>JobTimeoutMinute defines after how many minutes a job that has not yet finished should be stopped and removed.
<p>JobTimeoutMinutes defines after how many minutes a job that has not yet finished should be stopped and removed.
Default value is 10 minutes.</p>
</td>
</tr>
Expand Down
4 changes: 2 additions & 2 deletions pkg/apis/planetscale/v2/vitessbackupschedule_methods.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ const DefaultAllowedMissedRuns = 100
// GetMissedRunsLimit returns the maximum number of missed run we can allow.
// Returns DefaultAllowedMissedRuns if the value was not specified by the user.
func (vbsc *VitessBackupSchedule) GetMissedRunsLimit() int {
if vbsc.Spec.AllowedMissedRun == nil {
if vbsc.Spec.AllowedMissedRuns == nil {
return DefaultAllowedMissedRuns
}
return *vbsc.Spec.AllowedMissedRun
return *vbsc.Spec.AllowedMissedRuns
}
27 changes: 15 additions & 12 deletions pkg/apis/planetscale/v2/vitessbackupschedule_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,14 +122,17 @@ type VitessBackupScheduleTemplate struct {
// +kubebuilder:validation:Minimum=0
FailedJobsHistoryLimit *int32 `json:"failedJobsHistoryLimit,omitempty"`

// Suspend enables the suspension of the VitessBackupSchedule, pausing any further scheduled
// Suspend pause the associated backup schedule. Pausing any further scheduled
// runs until Suspend is set to false again. This is useful if you want to pause backup without
// having to remove the entire VitessBackupSchedule object from the cluster.
// +optional
Suspend *bool `json:"suspend,omitempty"`

// StartingDeadlineSeconds allows for the VitessBackupSchedule controller to start jobs that late
// by the given amount of seconds.
// StartingDeadlineSeconds enables the VitessBackupSchedule to start a job even though it is late by
// the given amount of seconds. Let's say for some reason the controller process a schedule run on
// second after its scheduled time, if StartingDeadlineSeconds is set to 0, the job will be skipped
// as it's too late, but on the other hand, if StartingDeadlineSeconds is greater than one second,
// the job will be processed as usual.
// +optional
// +kubebuilder:validation:Minimum=0
StartingDeadlineSeconds *int64 `json:"startingDeadlineSeconds,omitempty"`
Expand All @@ -143,24 +146,24 @@ type VitessBackupScheduleTemplate struct {
// +kubebuilder:example="Forbid"
ConcurrencyPolicy ConcurrencyPolicy `json:"concurrencyPolicy,omitempty"`

// AllowedMissedRun defines how many missed run of the schedule will be allowed before giving up on finding the last job.
// If the operator's clocked is skewed and we end-up missing a certain number of jobs, finding the last
// job might be very consuming, depending on the frequency of the schedule and the duration during which
// the operator's clock was misbehaving, also depending on how laggy the clock is, we can end-up with thousands
// or missed runs. For this reason, AllowedMissedRun, which is set to 100 by default, will allow us to give up finding
// the last job and simply wait for the next job on the schedule.
// AllowedMissedRuns defines how many missed run of the schedule will be allowed before giving up on finding the last job.
// If the operator's clock is skewed and we end-up missing a certain number of jobs, finding the last
// job might be very time-consuming, depending on the frequency of the schedule and the duration during which
// the operator's clock was misbehaving. Also depending on how laggy the clock is, we can end-up with thousands
// of missed runs. For this reason, AllowedMissedRun, which is set to 100 by default, will short circuit the search
// and simply wait for the next job on the schedule.
// Unless you are experiencing issue with missed runs due to a misconfiguration of the clock, we recommend leaving
// this field to its default value.
// +optional
// +kubebuilder:validation:Minimum=0
AllowedMissedRun *int `json:"allowedMissedRun,omitempty"`
AllowedMissedRuns *int `json:"allowedMissedRun,omitempty"`

// JobTimeoutMinute defines after how many minutes a job that has not yet finished should be stopped and removed.
// JobTimeoutMinutes defines after how many minutes a job that has not yet finished should be stopped and removed.
// Default value is 10 minutes.
// +optional
// +kubebuilder:validation:Minimum=0
// +kubebuilder:default=10
JobTimeoutMinute int32 `json:"jobTimeoutMinute,omitempty"`
JobTimeoutMinutes int32 `json:"jobTimeoutMinute,omitempty"`
}

// VitessBackupScheduleStrategy defines how we are going to take a backup.
Expand Down
4 changes: 2 additions & 2 deletions pkg/apis/planetscale/v2/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"context"
"flag"
"fmt"
"strings"

"sort"
"time"
Expand Down Expand Up @@ -186,7 +187,7 @@ func (r *ReconcileVitessBackupsSchedule) Reconcile(ctx context.Context, req ctrl
r.cleanupJobsWithLimit(ctx, jobs.failed, vbsc.GetFailedJobsLimit())
r.cleanupJobsWithLimit(ctx, jobs.successful, vbsc.GetSuccessfulJobsLimit())

err = r.removeTimeoutJobs(ctx, jobs.successful, vbsc.Spec.JobTimeoutMinute)
err = r.removeTimeoutJobs(ctx, jobs.successful, vbsc.Spec.JobTimeoutMinutes)
if err != nil {
// We had an error while removing timed out jobs, we can requeue
return resultBuilder.Error(err)
Expand Down Expand Up @@ -262,36 +263,35 @@ func (r *ReconcileVitessBackupsSchedule) Reconcile(ctx context.Context, req ctrl
func getNextSchedule(vbsc planetscalev2.VitessBackupSchedule, now time.Time) (time.Time, time.Time, error) {
sched, err := cron.ParseStandard(vbsc.Spec.Schedule)
if err != nil {
return time.Time{}, time.Time{}, fmt.Errorf("unparaseable schedule %q: %v", vbsc.Spec.Schedule, err)
return time.Time{}, time.Time{}, fmt.Errorf("unable to parse schedule %q: %v", vbsc.Spec.Schedule, err)
}

// for optimization purposes, cheat a bit and start from our last observed run time
// we could reconstitute this here, but there's not much point, since we've
// just updated it.
var earliestTime time.Time
// Set the last scheduled time by either looking at the VitessBackupSchedule's Status or
// by looking at its creation time.
var latestRun time.Time
if vbsc.Status.LastScheduledTime != nil {
earliestTime = vbsc.Status.LastScheduledTime.Time
latestRun = vbsc.Status.LastScheduledTime.Time
} else {
earliestTime = vbsc.ObjectMeta.CreationTimestamp.Time
latestRun = vbsc.ObjectMeta.CreationTimestamp.Time
}

if vbsc.Spec.StartingDeadlineSeconds != nil {
// controller is not going to schedule anything below this point
schedulingDeadline := now.Add(-time.Second * time.Duration(*vbsc.Spec.StartingDeadlineSeconds))

if schedulingDeadline.After(earliestTime) {
earliestTime = schedulingDeadline
if schedulingDeadline.After(latestRun) {
latestRun = schedulingDeadline
}
}

// Next schedule is later, simply return the next scheduled time.
if earliestTime.After(now) {
if latestRun.After(now) {
return time.Time{}, sched.Next(now), nil
}

var lastMissed time.Time
missedRuns := 0
for t := sched.Next(earliestTime); !t.After(now); t = sched.Next(t) {
for t := sched.Next(latestRun); !t.After(now); t = sched.Next(t) {
lastMissed = t
missedRuns++

Expand All @@ -311,7 +311,7 @@ func (r *ReconcileVitessBackupsSchedule) updateVitessBackupScheduleStatus(ctx co
vbsc.Status.LastScheduledTime = nil
}

vbsc.Status.Active = nil
vbsc.Status.Active = make([]corev1.ObjectReference, 0, len(activeJobs))
for _, activeJob := range activeJobs {
jobRef, err := ref.GetReference(r.scheme, activeJob)
if err != nil {
Expand Down Expand Up @@ -361,12 +361,8 @@ func (r *ReconcileVitessBackupsSchedule) getJobsList(ctx context.Context, req ct
log.WithError(err).Errorf("unable to parse schedule time for existing job, found: %s", job.Annotations[scheduledTimeAnnotation])
continue
}
if scheduledTimeForJob != nil {
if mostRecentTime == nil {
mostRecentTime = scheduledTimeForJob
} else if mostRecentTime.Before(*scheduledTimeForJob) {
mostRecentTime = scheduledTimeForJob
}
if scheduledTimeForJob != nil && (mostRecentTime == nil || mostRecentTime.Before(*scheduledTimeForJob)) {
mostRecentTime = scheduledTimeForJob
}
}
return jobs, mostRecentTime, nil
Expand Down Expand Up @@ -407,7 +403,7 @@ func (r *ReconcileVitessBackupsSchedule) removeTimeoutJobs(ctx context.Context,
if err != nil {
return err
}
if jobStartTime.After(time.Now().Add(time.Minute * time.Duration(timeout))) {
if jobStartTime.Add(time.Minute * time.Duration(timeout)).After(time.Now()) {
if err := r.client.Delete(ctx, job, client.PropagationPolicy(metav1.DeletePropagationBackground)); (err) != nil {
log.WithError(err).Errorf("unable to delete timed out job: %s", job.Name)
} else {
Expand Down Expand Up @@ -495,28 +491,28 @@ func (r *ReconcileVitessBackupsSchedule) createJobPod(ctx context.Context, vbsc
// ensures that there will be at least one item in this list. The YAML cannot be applied with
// empty list of strategies.
args := []string{"/bin/sh", "-c"}
var cmd string
var cmd strings.Builder

for i, strategy := range vbsc.Spec.Strategy {
if i > 0 {
cmd = fmt.Sprintf("%s && ", cmd)
cmd.WriteString(fmt.Sprintf("%s && ", cmd))

Check failure on line 498 in pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go

View workflow job for this annotation

GitHub Actions / Unit Test

fmt.Sprintf format %s has arg cmd of wrong type strings.Builder
}
// At this point, strategy.Name is either BackupShard or BackupTablet, the validation
// is made at the CRD level on the YAML directly.

cmd = fmt.Sprintf("%s%s %s", cmd, vtctldclientPath, vtctldclientServerArg)
cmd.WriteString(fmt.Sprintf("%s%s %s", cmd, vtctldclientPath, vtctldclientServerArg))

Check failure on line 503 in pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go

View workflow job for this annotation

GitHub Actions / Unit Test

fmt.Sprintf format %s has arg cmd of wrong type strings.Builder

// Add the vtctldclient command
switch strategy.Name {
case planetscalev2.BackupShard:
cmd = fmt.Sprintf("%s BackupShard", cmd)
cmd.WriteString(fmt.Sprintf("%s BackupShard", cmd))

Check failure on line 508 in pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go

View workflow job for this annotation

GitHub Actions / Unit Test

fmt.Sprintf format %s has arg cmd of wrong type strings.Builder
case planetscalev2.BackupTablet:
cmd = fmt.Sprintf("%s Backup", cmd)
cmd.WriteString(fmt.Sprintf("%s Backup", cmd))

Check failure on line 510 in pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go

View workflow job for this annotation

GitHub Actions / Unit Test

fmt.Sprintf format %s has arg cmd of wrong type strings.Builder
}

// Add any flags
for key, value := range strategy.ExtraFlags {
cmd = fmt.Sprintf("%s --%s=%s", cmd, key, value)
cmd.WriteString(fmt.Sprintf("%s --%s=%s", cmd, key, value))

Check failure on line 515 in pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go

View workflow job for this annotation

GitHub Actions / Unit Test

fmt.Sprintf format %s has arg cmd of wrong type strings.Builder
}

// Add keyspace/shard or tablet alias
Expand All @@ -525,15 +521,15 @@ func (r *ReconcileVitessBackupsSchedule) createJobPod(ctx context.Context, vbsc
if strategy.KeyspaceShard == "" {
return pod, fmt.Errorf("the KeyspaceShard field is missing from VitessBackupScheduleStrategy %s", planetscalev2.BackupShard)
}
cmd = fmt.Sprintf("%s %s", cmd, strategy.KeyspaceShard)
cmd.WriteString(fmt.Sprintf("%s %s", cmd, strategy.KeyspaceShard))

Check failure on line 524 in pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go

View workflow job for this annotation

GitHub Actions / Unit Test

fmt.Sprintf format %s has arg cmd of wrong type strings.Builder
case planetscalev2.BackupTablet:
if strategy.TabletAlias == "" {
return pod, fmt.Errorf("the TabletAlias field is missing from VitessBackupScheduleStrategy %s", planetscalev2.BackupTablet)
}
cmd = fmt.Sprintf("%s %s", cmd, strategy.TabletAlias)
cmd.WriteString(fmt.Sprintf("%s %s", cmd, strategy.TabletAlias))

Check failure on line 529 in pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go

View workflow job for this annotation

GitHub Actions / Unit Test

fmt.Sprintf format %s has arg cmd of wrong type strings.Builder
}
}
args = append(args, cmd)
args = append(args, cmd.String())

pod = corev1.PodSpec{
Containers: []corev1.Container{{
Expand Down Expand Up @@ -572,7 +568,7 @@ func (r *ReconcileVitessBackupsSchedule) getVtctldServiceName(ctx context.Contex
}

if svcName == "" || svcPort == 0 {
return "", 0, fmt.Errorf("no vtctld service in %q found", vbsc.Namespace)
return "", 0, fmt.Errorf("no vtctld service found in %q namespace", vbsc.Namespace)
}
return svcName, svcPort, nil
}
4 changes: 2 additions & 2 deletions pkg/controller/vitesscluster/reconcile_backup_schedule.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ func (r *ReconcileVitessCluster) reconcileBackupSchedule(ctx context.Context, vt

// Generate keys (object names) for all desired cells.
// Keep a map back from generated names to the specs.
var keys []client.ObjectKey
scheduleMap := make(map[client.ObjectKey]*planetscalev2.VitessBackupScheduleTemplate)
keys := make([]client.ObjectKey, 0, len(vt.Spec.Backup.Schedules))
scheduleMap := make(map[client.ObjectKey]*planetscalev2.VitessBackupScheduleTemplate, len(vt.Spec.Backup.Schedules))
if vt.Spec.Backup != nil {
for i := range vt.Spec.Backup.Schedules {
schedule := &vt.Spec.Backup.Schedules[i]
Expand Down
10 changes: 6 additions & 4 deletions test/endtoend/backup_restore_test.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/bin/bash

set -euo pipefail

source ./tools/test.env
source ./test/endtoend/utils.sh

Expand Down Expand Up @@ -30,14 +32,14 @@ function resurrectShard() {
sleep 5

echo "show databases;" | mysql | grep "commerce" > /dev/null 2>&1
if [ $? -ne 0 ]; then
if [[ $? -ne 0 ]]; then
echo "Could not find commerce database"
printMysqlErrorFiles
exit 1
fi

echo "show tables;" | mysql commerce | grep -E 'corder|customer|product' | wc -l | grep 3 > /dev/null 2>&1
if [ $? -ne 0 ]; then
if [[ $? -ne 0 ]]; then
echo "Could not find commerce's tables"
printMysqlErrorFiles
exit 1
Expand Down Expand Up @@ -76,7 +78,7 @@ EOF
}

function setupKindConfig() {
if [ "$BUILDKITE_BUILD_ID" != "0" ]; then
if [[ "$BUILDKITE_BUILD_ID" != "0" ]]; then
# The script is being run from buildkite, so we can't mount the current
# working directory to kind. The current directory in the docker is workdir
# So if we try and mount that, we get an error. Instead we need to mount the
Expand All @@ -99,7 +101,7 @@ docker build -f build/Dockerfile.release -t vitess-operator-pr:latest .
echo "Setting up the kind config"
setupKindConfig
echo "Creating Kind cluster"
kind create cluster --wait 30s --name kind-${BUILDKITE_BUILD_ID} --config ./vtdataroot/config.yaml --image kindest/node:v1.28.0
kind create cluster --wait 30s --name kind-${BUILDKITE_BUILD_ID} --config ./vtdataroot/config.yaml --image ${KIND_VERSION}
echo "Loading docker image into Kind cluster"
kind load docker-image vitess-operator-pr:latest --name kind-${BUILDKITE_BUILD_ID}

Expand Down
13 changes: 7 additions & 6 deletions test/endtoend/backup_schedule_test.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/bin/bash

set -euo pipefail

source ./tools/test.env
source ./test/endtoend/utils.sh

Expand All @@ -15,9 +17,8 @@ function checkVitessBackupScheduleStatusWithTimeout() {
regex=$1

for i in {1..1200} ; do
out=$(kubectl get VitessBackupSchedule)
echo "$out" | grep -E "$regex" | wc -l | grep "1" > /dev/null 2>&1
if [ $? -eq 0 ]; then
kubectl get VitessBackupSchedule | grep -E "$regex" | wc -l | grep "1" > /dev/null 2>&1
if [[ $? -eq 0 ]]; then
echo "$regex found"
return
fi
Expand All @@ -38,7 +39,7 @@ function verifyListBackupsOutputWithSchedule() {
sleep 360

backupCount=$(kubectl get vtb --no-headers | wc -l)
if [ "${backupCount}" -lt 7 ]; then
if [[ "${backupCount}" -lt 7 ]]; then
echo "Did not find at least 7 backups"
return 0
fi
Expand All @@ -49,7 +50,7 @@ function verifyListBackupsOutputWithSchedule() {
}

function setupKindConfig() {
if [ "$BUILDKITE_BUILD_ID" != "0" ]; then
if [[ "$BUILDKITE_BUILD_ID" != "0" ]]; then
# The script is being run from buildkite, so we can't mount the current
# working directory to kind. The current directory in the docker is workdir
# So if we try and mount that, we get an error. Instead we need to mount the
Expand All @@ -72,7 +73,7 @@ docker build -f build/Dockerfile.release -t vitess-operator-pr:latest .
echo "Setting up the kind config"
setupKindConfig
echo "Creating Kind cluster"
kind create cluster --wait 30s --name kind-${BUILDKITE_BUILD_ID} --config ./vtdataroot/config.yaml --image kindest/node:v1.28.0
kind create cluster --wait 30s --name kind-${BUILDKITE_BUILD_ID} --config ./vtdataroot/config.yaml --image ${KIND_VERSION}
echo "Loading docker image into Kind cluster"
kind load docker-image vitess-operator-pr:latest --name kind-${BUILDKITE_BUILD_ID}

Expand Down
Loading

0 comments on commit e6e0e45

Please sign in to comment.