Skip to content
reisenberger edited this page Mar 10, 2018 · 119 revisions

The roadmap indicates currently envisaged or candidate development directions for Polly.

Community feedback on features you would like, or priorities, is welcome.

Comment on overall direction, new features, prioritisation requests under the roadmap placeholder issue.

If you're interested in following and contributing to the development direction of Polly, join our public Slack channel for real-time discussion of ideas.

High Priority

may be under active development, may be found in an upcoming release

Emit events, aggregate metrics to dashboards

In dev now for v6.x

New fuller proposal: PROPOSAL--Polly-eventing-and-metrics-architecture

Slack channel: Metrics slack channel

Github discussion: #326

With the number of resilience strategies now available in Polly v5, emitting Policy events/statistics, and timings to completion/failure of Policy executions, could be valuable.

The planned architecture envisages three layers:

  • raise raw events from policies
  • aggregate those events with Rx or another strategy, to form relevant metrics/stats
  • transform those metrics to the format required by any particular dashboard.

The layered architecture is intended to allow users to push metrics to any dashboard (eg AppInsights; Influx; Prometheus), including users writing their own plugins for custom dashboards if desired.

prototyping started, discussion transitioning to development

Medium priority

next after high-priority items

Ability to execute delegates taking strongly-typed input parameters

This proposes execute overloads giving the ability to execute delegates taking input parameters, without using closures, as discussed in #271:

TResult Policy.ExecuteAsync<T1, TResult>(Func<T1, Context, CancellationToken, Task<TResult>> action, T1 input1, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext)

Polly v6.0 plans to deliver this.

Circuit-breaker customisation

Several user requests concern refining circuit-breaker behaviour:

  • refining the transition out of half-open state #239; #254
  • distributed circuit breaker #287

Polly v6.0 proposes adding a CustomCircuitBreaker(ICircuitController myCustomController) to open up circuit-breaker customisation.

Lower priority at present

awaiting/seeking development resource, or evaluating importance

Refresh syntax to reduce the number of overloads

The number of overloads available to configure policies and execute actions can cause confusion.

A possible mitigation could be splitting up the fluent syntax of step 2, policy configuration. Currently the mandatory parameters of policy configuration (eg how many retries to make) and optional delegate hooks (eg OnRetry) are configured in the same overload. Separating out the optional delegate hooks into postfix methods as below could reduce the burden.

// Syntax under consideration
Policy
  .Handle<HttpException>()
  .Or<WebException>()
  .Retry(3, i => TimeSpan.FromSeconds(Math.Pow(2,i)))
  .OnRetry(ex, timespan => ...)
  .Execute(...);

Remove split between sync and async policies

A partial syntax proposal to address split sync/async policies now exists: comment welcome.

This would likely be implemented in combination with the syntax refresh above. This is a large piece of work requiring a major time investment.

Inactive

not currently in line for active development

Failover Policy

A common resilience strategy is to have failover among a series of possible endpoints or systems to try, for a particular operation.

The existing FallbackPolicy provides a single fallback for a failed execution. A FailoverPolicy would go further and automate the process of automatic failover among sources in a round-robin fashion: execute against system A; if that fails, execute against B; if that fails, execute against C etc (eventually wrapping back to A).

// Possible syntax:

// configuration
FailoverPolicy failoverPolicy = Policy
   .Handle<WhateverException>() // These exceptions would cause failover to the next provider
   .Failover(IEnumerable<TProvider> providers); // OR: .Failover(Func<TProvider> getNextProvider)
 
// usage
TResult result = failoverPolicy.Execute<TProvider, TResult>(Func<TProvider, TResult> func);
failoverPolicy.Execute<TProvider>(Action<TProvider> action);

Athough one configuration overload specifies IEnumerable<TProvider> providers, this need not be a fixed collection. It is easy to code an IEnumerable iterating over a dynamic collection. Equally, this can be an infinite enumerable, looping back to the start when it hits the end, if desired.

FailoverPolicy would also support a method for manually failing to the next endpoint, .FailOver(). Any other metric/trigger you care could thus be used to trigger failover. For instance, .FailOver() could be called from the onBreak delegate of a circuit-breaker, to make eg an AdvancedCircuitBreaker threshold trigger failing over, in a more automated version of the process described here.

Implementation consideration: The double-generic nature of the method .Execute<TProvider, TResult>(Func<TProvider, TResult> func) provides a challenge for Polly v5. Execute methods taking a strongly-typed input parameter do not exist in Polly v5; and even if they were added specifically on FailoverPolicy, that would not make them available throughout a PolicyWrap, for when FailoverPolicy might be used within a PolicyWrap.

The PollyExecutables proposed for Polly v6 solve all these problems. They open up the path for Execute methods taking strongly-typed input parameters. And the executable instance representing an execution with a strongly-typed input paramater, can be passed all through a PolicyWrap.

Moved to lower priority. The concept fits well with resilience, but other options exist in many cases, such as network load-balancers, cloud traffic-management tools, in-built multi-node-targeting in APIs which connect to third-party systems.

The following existing issues describe possibilities in the meantime: #199 and #262

Rate-limit Policy

A policy to limit the number of calls placed through the policy per timespan. Useful when calling a third-party system which imposes a rate-limit on its API, provided that rate-limit is known. Perhaps taking a similar approach (with refinements) to something like Jack Leitch's RateGate.

Compare BulkheadPolicy. While BulkheadPolicy is a parallelism-throttle, RateLimitPolicy would be a serial throttle.

Note: A rate-limiting design of the RateGate kind, which 'holds back' already-executing hot Tasks or threads until it is their time/turn to proceed, is intrinsically vulnerable to memory bulges (all those waiting executions have to be held in memory) in high-volume scenarios where fresh requests consistently outstrip the permitted rate. Two possibilities to deal with this are co-operative demand control (aka back-pressure) and load-shedding.

In high-volume scenarios where you have control over the producers, co-operative demand control by back-pressure is recommended; Akka streams is a mature solution for this.

For those whose scenario is amenable to Rx, there may be the option of in-built operators such as throttle, buffer, sample and window.

A RateLimitPolicy in Polly could still be useful, particularly if we can provide options for configurable load-limits/load-shedding. Possibilities include:

  • a configurable upper-bound on the number of executions (across all threads) that are allowed to queue. (This would have nice symmetry with the queue on BulkheadPolicy.)
  • shedding actions which have been queuing longer than a configurable TimeSpan (which TimeoutPolicy already provides).

Discussion in our Slack channel around here also drew out the difference between being the rate-abiding or rate-imposing party. Being the rate-imposing party sounds as simple as rejecting excess calls immediately rather than allowing them to queue.

See also important alternative below re RetryAfter

Honouring RetryAfter as an alternative to Rate-Limit Policy

Many Azure APIs impose a rate-limit on usage dependent on the pricing tier, eg CosmosDB, and many of the cognitive services.

If you are seeking a Rate-Limit policy in connection with these, be sure to explore the alternative of Retry policies honouring a 429 RetryAfter response code. CosmosDB and many cognitive services APIs return 429 responses indicating when a request may be retried. Polly already offers WaitAndRetry overloads which can calculate the duration to wait based on the returned result (ie the Retry-After header in this case).

Consider introducing configuration-provider interfaces and/or POCOs

Consider introducing configuration-provider interfaces and/or POCOs such as discussed here and exemplified here.

The original intention was that these POCOs might describe the numeric configuration parameters of a policy, such as 'number of consecutive faults before breaking'. This in turn would allow external configuration sources to be mapped to these POCOs, allowing the creation of configuration-helper plugins for various configuration sources.

Assessment: This would be simple for policies whose key configuration details are numeric, such as circuit-breakers and bulkheads.

A challenge would be policies offering configurations which are far from purely numeric. WaitAndRetry(), for instance, has two key use cases which define waits between retries as dynamic functions: exponential backoff and jitter. These could be parameterized in the numeric parameters defining the exponential backoff and jitter. However WaitAndRetry still offers purely dynamic Func<int, TimeSpan> option and IEnumerable<TimeSpan> options which are not definable at all through numeric parameters; a retry configuration POCO mapped to a config source could only ever provide partial coverage of the available retry options.

Timeout policy also falls in this category as it has an entirely dynamic Func<TimeSpan> configuration option.

Current view: Low priority for the Polly team's time (as against deeper resilience challenges where that time can be invested). It is relatively easy for users to develop links to their own configuration sources for the subset of Polly options they choose to use. Moved to inactive.

Getting involved!

Comment on this roadmap here

Expressions of interest in developing any of the above functionality welcome! (for major features, contact the Polly team first to avoid duplication of effort). Some issues are also marked as 'up-for-grabs' at any time, in the Issues list. See also the notes for contributors!

Status

The roadmap is published for transparency and to solicit community input, but is by its nature indicative and subject to change: proposed features may be more difficult to implement than envisaged, or may be down-prioritised as we continue to seek the balance for the library between power and simplicity.

Features delivered from previous Roadmaps

February 2018: dynamic reconfiguration during running

November 2017: numerous small enhancements including durations to wait in wait and retry, based on error response.

October 2017: CachePolicy (thanks to @seanfarrow for much contribution to the thinking).

June 2017: PolicyRegistry (thanks to @ankitbko); interfaces.

May 2017: Share rich information between execution and control delegates (blog)

February 2017: NoOp Policy (thanks to @lakario)

December 2016: Polly v5.0.3 RTM:

  • Full .NET Standard 1.0 compatibility
  • Bulkhead policy
  • Timeout policy (including walking away from executions with no in-built timeout)
  • Fallback policy
  • PolicyWrap,
  • PolicyKeys and ContextData
  • Rationalised .NET40 async support

October 2016: Polly v5.0-alpha, with four new policies: Bulkhead, Timeout, Fallback, and PolicyWrap

July 2016: .NET Core 1.0 RTM support; .NET Standard 1.0 support

June 2016: Policies to handle return values

June 2016: Polly.Net40Async

April 2016: Advanced Circuit Breaker

Mar 2016: Full ContextualPolicy support

Feb 2016: Manual control and public state for circuit breaker, for health/performance monitoring

Jan 2016: Full async support.

Clone this wiki locally