Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to add jitter to interval #173

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@ Use `UTC`, `Local` or pick a timezone name from the [(IANA) tz database](https:/
| Option | Description | Default |
|---------------------------|----------------------------------------------------------------------|----------------------------|
| `--interval` | interval between pod terminations | 10m |
| `--max-jitter` | max jitter to add to the interval | 0s |
| `--labels` | label selector to filter pods by | (matches everything) |
| `--annotations` | annotation selector to filter pods by | (matches everything) |
| `--namespaces` | namespace selector to filter pods by | (all namespaces) |
Expand Down
4 changes: 3 additions & 1 deletion chaoskube/chaoskube.go
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,10 @@ func New(client kubernetes.Interface, labels, annotations, namespaces, namespace

// Run continuously picks and terminates a victim pod at a given interval
// described by channel next. It returns when the given context is canceled.
func (c *Chaoskube) Run(ctx context.Context, next <-chan time.Time) {
func (c *Chaoskube) Run(ctx context.Context, maxJitter time.Duration, next <-chan time.Time) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we only had one "tick source". A Ticker with a channel and some jitter inside would be nice Something like https://github.com/lthibault/jitterbug maybe.

for {
jitter := util.RandomJitter(maxJitter)
time.Sleep(jitter)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with two sources of delay (next channel and jitter duration, see above comment) we shouldn't use sleep here because it blocks handling Ctrl+c etc.

Instead we can do another select:

select {
case <-time.After(jitter):
case <-ctx.Done():
  return
}

after the select that's already below.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that this jitter will only increase the interval, never reduce it, right?

So, an interval of 1m and jitter of 10s will result in intervals between [1m, 1m10s]? (As opposed to [55m,1m5s])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, I'm going to look at the jitterbug to see if that would fix most of the issues found here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linki so i think going with the jitterbug's normal distribution is what would fit well here, using a mean of 0 and standard deviation provided by the user, thoughts?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@desponda Sgtm. Thanks for looking into it!

if err := c.TerminateVictims(); err != nil {
c.Logger.WithField("err", err).Error("failed to terminate victim")
metrics.ErrorsTotal.Inc()
Expand Down
2 changes: 1 addition & 1 deletion chaoskube/chaoskube_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ func (suite *Suite) TestRunContextCanceled() {
ctx, cancel := context.WithCancel(context.Background())
cancel()

chaoskube.Run(ctx, nil)
chaoskube.Run(ctx, time.Duration(0), nil)
}

// TestCandidates tests that the various pod filters are applied correctly.
Expand Down
21 changes: 16 additions & 5 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ import (
"syscall"
"time"

"gopkg.in/alecthomas/kingpin.v2"
"github.com/prometheus/client_golang/prometheus/promhttp"
log "github.com/sirupsen/logrus"
"gopkg.in/alecthomas/kingpin.v2"

"k8s.io/apimachinery/pkg/labels"
"k8s.io/client-go/kubernetes"
Expand Down Expand Up @@ -51,6 +51,7 @@ var (
master string
kubeconfig string
interval time.Duration
maxJitter time.Duration
dryRun bool
debug bool
metricsAddress string
Expand Down Expand Up @@ -79,6 +80,7 @@ func init() {
kingpin.Flag("master", "The address of the Kubernetes cluster to target").StringVar(&master)
kingpin.Flag("kubeconfig", "Path to a kubeconfig file").StringVar(&kubeconfig)
kingpin.Flag("interval", "Interval between Pod terminations").Default("10m").DurationVar(&interval)
kingpin.Flag("max-jitter", "The max duration of jitter to add to the interval").Default("0s").DurationVar(&maxJitter)
kingpin.Flag("dry-run", "Don't actually kill any pod. Turned on by default. Turn off with `--no-dry-run`.").Default("true").BoolVar(&dryRun)
kingpin.Flag("debug", "Enable debug logging.").BoolVar(&debug)
kingpin.Flag("metrics-address", "Listening address for metrics handler").Default(":8080").StringVar(&metricsAddress)
Expand Down Expand Up @@ -121,6 +123,7 @@ func main() {
"master": master,
"kubeconfig": kubeconfig,
"interval": interval,
"maxJitter": maxJitter,
"dryRun": dryRun,
"debug": debug,
"metricsAddress": metricsAddress,
Expand All @@ -130,9 +133,10 @@ func main() {
}).Debug("reading config")

log.WithFields(log.Fields{
"version": version,
"dryRun": dryRun,
"interval": interval,
"version": version,
"dryRun": dryRun,
"interval": interval,
"maxJitter": maxJitter,
}).Info("starting up")

client, err := newClient()
Expand All @@ -158,6 +162,13 @@ func main() {
"maxKill": maxKill,
}).Info("setting pod filter")

if interval <= maxJitter {
log.WithFields(log.Fields{
"interval": interval,
"maxJitter": maxJitter,
}).Fatal("maxJitter must be less than interval")
}

parsedWeekdays := util.ParseWeekdays(excludedWeekdays)
parsedTimesOfDay, err := util.ParseTimePeriods(excludedTimesOfDay)
if err != nil {
Expand Down Expand Up @@ -235,7 +246,7 @@ func main() {
ticker := time.NewTicker(interval)
defer ticker.Stop()

chaoskube.Run(ctx, ticker.C)
chaoskube.Run(ctx, maxJitter, ticker.C)
}

func newClient() (*kubernetes.Clientset, error) {
Expand Down
10 changes: 10 additions & 0 deletions util/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -195,3 +195,13 @@ func RandomPodSubSlice(pods []v1.Pod, count int) []v1.Pod {
res := pods[0:count]
return res
}

// RandomJitter returns a random jitter based of maxJitter
func RandomJitter(maxJitter time.Duration) time.Duration {
if maxJitter.Nanoseconds() == 0 {
return time.Duration(0)
}
seed := rand.NewSource(time.Now().UnixNano())
rng := rand.New(seed)
return time.Duration(rng.Int63n(maxJitter.Nanoseconds()))
}
6 changes: 6 additions & 0 deletions util/util_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,12 @@ func (suite *Suite) TestRandomPodSublice() {
}
}

func (suite *Suite) TestRandomJitter() {
zeroJitter := time.Duration(0)
randomJitter := RandomJitter(zeroJitter)
suite.Equal(int64(0), randomJitter.Nanoseconds())
}

func TestSuite(t *testing.T) {
suite.Run(t, new(Suite))
}