-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Retry with jitter
A well-known retry strategy is exponential backoff, allowing retries to be made initially quickly, but then at progressively longer intervals: for example, after 2, 4, 8, 15, then 30 seconds.
In very high-throughput scenarios, however, it can also be beneficial to add jitter to wait-and-retry strategies, to prevent retries bunching into further spikes of load.
Consider a given call path experiencing hundreds of calls per second where the underlying system suddenly fails. A fixed-progression exponential backoff can still generate peaks of load.
If the underlying system fails at time t, for example, and you have a fixed-progression backoff strategy generating retries after 2, 4, 8, 15 and 30 second intervals, you can expect to generate further specific jumps in load at t + 2, t + 6, t + 14 etc.
Jitter is a decorrelation strategy which adds randomness to retry intervals to spread out load and thus avoid spikes.
Jitter is not provided as an in-built configuration overload within Polly, as a number of different jitter strategies are possible, and jitter is easy to implement with existing overloads which allow you to dynamically calculate the duration to wait before retrying.
Jitter can be achieved using any of the .WaitAndRetry(...)
overloads which allow you to specify a Func<..., TimeSpan>
for calculating the duration of wait before retrying. (That link shows a simple overload, but more complex variants exist, and similar overloads exist for async retry policies.)
A simple jitter can be achieved by adding a randomly-varying extra delay to the wait before retrying:
Random jitterer = new Random();
Policy
.Handle<HttpResponseException>() // etc
.WaitAndRetry(5,
retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)) // exponential back-off: 2, 4, 8 etc
+ TimeSpan.FromMilliseconds(jitterer.Next(0, 1000)) // plus some jitter: up to 1 second
);
Note however that this jitter doesn't scale with increasing retries.
A comparison of Jitter approaches by the AWS architecture team highlights a number of more complex strategies. These typically combine an element of backoff with also scaling the amount of jitter in relation to the try number or previous delay.
The following code implements the 'Decorrelated Jitter' strategy discussed in that article, for Polly.
The core work is to implement a function which will return a jittered delay based on a maximum number of retries, a seed delay and a max delay:
public static IEnumerable<TimeSpan> DecorrelatedJitter(int maxRetries, TimeSpan seedDelay, TimeSpan maxDelay)
{
Random jitterer = new Random();
int retries = 0;
double seed = seedDelay.TotalMilliseconds;
double max = maxDelay.TotalMilliseconds;
double current = seed;
while (++retries <= maxRetries)
{
current = Math.Min(max, Math.Max(seed, current * 3 * jitterer.NextDouble())); // adopting the 'Decorrelated Jitter' formula from https://www.awsarchitectureblog.com/2015/03/backoff.html. Can be between seed and previous * 3. Mustn't exceed max.
yield return TimeSpan.FromMilliseconds(current);
}
}
A policy using this strategy can be defined as follows:
Policy retryWithDecorrelatedJitter = Policy
.Handle<WhateverException>()
.WaitAndRetry(DecorrelatedJitter(maxRetries, seedDelay, maxDelay));
The parameters are:
maxRetries
: the maximum number of retries that will be made.
seed
: should represent the minimum wait-before-any-retry that you want.
- This is used as a seed in calculations, but note that it does not fix the initial retry delay to this value (a fixed first retry interval would cause a spike).
max
: the maximum wait-before-any-retry that you want.
-
max >= 4 * seed
works well (and bigger ratios are fine). Ifseed
andmax
are any closer, values will tend to bunch at the min and max.
- The algorithm will produce delay values all of which fall between
seed
andmax
. - The formula will tend, over several iterations, to produce values with an increasing delay (that is, there is an in-built element of backoff). However, the formula can also (intentionally) produce a shorter delay for one retry than the previous. (The formula
current * 3 * jitterer.NextDouble()
has a 2-in-3 chance of producing a greater value than previously, and a 1-in-3 chance of lower.)
A full working example as a Console app, illustrating the typical delay values the strategy generates, can be found here.
- Home
- Polly RoadMap
- Contributing
- Transient fault handling and proactive resilience engineering
- Supported targets
- Retry
- Circuit Breaker
- Advanced Circuit Breaker
- Timeout
- Bulkhead
- Cache
- Rate-Limit
- Fallback
- PolicyWrap
- NoOp
- PolicyRegistry
- Polly and HttpClientFactory
- Asynchronous action execution
- Handling InnerExceptions and AggregateExceptions
- Statefulness of policies
- Keys and Context Data
- Non generic and generic policies
- Polly and interfaces
- Some policy patterns
- Debugging with Polly in Visual Studio
- Unit-testing with Polly
- Polly concept and architecture
- Polly v6 breaking changes
- Polly v7 breaking changes
- DISCUSSION PROPOSAL- Polly eventing and metrics architecture