diff --git a/main/404.html b/main/404.html deleted file mode 100644 index a134fc681..000000000 --- a/main/404.html +++ /dev/null @@ -1,3326 +0,0 @@ - - - -
- - - - - - - - - - - - - - -This document details how to setup a local reference architecture, and design and deploy an API. This will show the following API management features in a kube native environment using Kuadrant and other open source tools:
-The sections in this document are grouped by the persona that is typically associated with the steps in that section. The 3 personas are:
-docker
: https://www.docker.com/products/docker-desktop/kind
: https://kind.sigs.k8s.io/kubectl
: https://kubernetes.io/docs/reference/kubectl/kustomize
: https://kustomize.io/helm
: https://helm.sh/docs/intro/install/operator-sdk
: https://sdk.operatorframework.io/docs/installation/Export the following env vars:
-export KUADRANT_AWS_ACCESS_KEY_ID=<key_id>
-export KUADRANT_AWS_SECRET_ACCESS_KEY=<secret>
-export KUADRANT_AWS_REGION=<region>
-export KUADRANT_AWS_DNS_PUBLIC_ZONE_ID=<zone>
-export KUADRANT_ZONE_ROOT_DOMAIN=<domain>
-
Clone the api-quickstart repo and run the quickstart script:
- -This will take several minutes as 3 local kind clusters are started and configured in a hub and spoke architecture. -The following components will be installed on the clusters:
-View the ManagedZone, Gateway and TLSPolicy. The ManagedZone and TLSPolicy should have a Ready status of true. The Gateway should have a Programmed status of True.
-kubectl --context kind-api-control-plane get managedzone,tlspolicy,gateway -n multi-cluster-gateways
-
Running the quick start script above will bring up Gatekeeper and the following constraints:
-To view the above constraints in kubernetes, run this command: -
-Info
-Since a gateway has been created automatically, along with a TLSPolicy
, the violation for a missing DNSPolicy
will be active until one is created.
To get a top level view of the constraints in violation, the Stitch: Platform Engineer Dashboard
can be used. This can be accessed by at https://grafana.172.31.0.2.nip.io
Grafana has a default username and password of admin
.
-You can find the Stitch: Platform Engineer Dashboard
dashboard in the Default
folder.
Create a DNSPolicy that targets the Gateway with the following command:
-kubectl --context kind-api-control-plane apply -f - <<EOF
-apiVersion: kuadrant.io/v1alpha1
-kind: DNSPolicy
-metadata:
- name: prod-web
- namespace: multi-cluster-gateways
-spec:
- targetRef:
- name: prod-web
- group: gateway.networking.k8s.io
- kind: Gateway
- loadBalancing:
- geo:
- defaultGeo: EU
-EOF
-
Since we have created all the policies that Gatekeeper had the guardrails around, you should no longer see any constraints in violation. This can be seen back in the Stitch: Platform Engineer Dashboard
in Grafana at https://grafana.172.31.0.2.nip.io
Fork and/or clone the Petstore App at https://github.com/Kuadrant/api-petstore
-git clone git@github.com:kuadrant/api-petstore && cd api-petstore
-# Or if you forked the repository:
-# git clone git@github.com:<your_github_username>/api-petstore && cd api-petstore
-
Then deploy it to the first workload cluster:
- -This will deploy:
-petstore
NamespaceSecret
, containing a static API key that we'll use later for authService
and Deployment
for our petstore appHTTPRoute
for our petstore appWhen the DNS Policy has been created, and the previously created HTTPRoute
has been attached, a DNS record custom resource will also be created in the cluster resulting in records being created in your AWS Route53. Navigate to Route53 and you should see some new records in the zone.
Configure the app REGION
to be eu
:
The raw Open API spec can be found in the root of the repo:
-cat openapi.yaml
-# ---
-# openapi: 3.0.2
-# info:
-# title: Stitch API Petstore
-# version: 1.0.18
-
We've included a number of sample x-kuadrant
extensions in the OAS spec already:
x-kuadrant
extension to detail the Gateway API Gateway associated with our app: x-kuadrant:
- route:
- name: petstore
- namespace: petstore
- labels:
- deployment: petstore
- owner: cferreir
- hostnames:
-
- - petstore.$KUADRANT_ZONE_ROOT_DOMAIN
- parentRefs:
- - name: prod-web
- namespace: kuadrant-multi-cluster-gateways
- kind: Gateway
-
/user/login
, we have a Gateway API backendRef
set and a rate_limit
set. The rate limit policy for this endpoint restricts usage of this endpoint to 2 requests in a 10 second window:
- /store/inventory
, we have also have a Gateway API backendRef
set and a rate_limit
set. The rate limit policy for the endpoint restricts usage of this endpoint to 10 requests in a 10 second window:
- securityScheme
setup for apiKey auth, powered by Authorino. We'll show this in more detail a little later:
- These extensions allow us to automatically generate Kuadrant Kubernetes resources, including AuthPolicies, RateLimitPolicies and Gateway API resources such as HTTPRoutes.
-kuadrantctl
is a cli that supports the generation of various Kubernetes resources via OAS specs.
-Let's run some commands to generate some of these resources.
-If you forked the api-pestore repo, you can check them in also.
-Let's apply these to our running workload to implement rate limiting and auth.
kuadrantctl
¶Download kuadrantctl
from the v0.2.0
release artifacts:
https://github.com/Kuadrant/kuadrantctl/releases/tag/v0.2.0
-Drop the kuadrantctl
binary somewhere into your $PATH (e.g. /usr/local/bin/
).
For this next part of the tutorial, we recommend installing yq
to pretty-print YAML resources.
kuadrantctl
¶We'll generate an AuthPolicy
to implement API key auth, per the securityScheme
in our OAS spec:
# Generate this resource and save:
-kuadrantctl generate kuadrant authpolicy --oas openapi.yaml | yq -P | tee resources/authpolicy.yaml
-
-# Apply this resource to our cluster:
-kubectl --context kind-api-workload-1 apply -f ./resources/authpolicy.yaml
-
Next we'll generate a RateLimitPolicy
, to protect our APIs with the limits we have setup in our OAS spec:
# Generate this resource and save:
-kuadrantctl generate kuadrant ratelimitpolicy --oas openapi.yaml | yq -P | tee resources/ratelimitpolicy.yaml
-
-# Apply this resource to our cluster:
-kubectl --context kind-api-workload-1 apply -f ./resources/ratelimitpolicy.yaml
-
Lastly, we'll generate a Gateway API HTTPRoute
to service our APIs:
# Generate this resource and save:
-kuadrantctl generate gatewayapi httproute --oas openapi.yaml | yq -P | tee resources/httproute.yaml
-
-# Apply this resource to our cluster, setting the hostname in via the KUADRANT_ZONE_ROOT_DOMAIN env var:
-kustomize build ./resources/ | envsubst | kubectl --context kind-api-workload-1 apply -f-
-
Navigate to your app's Swagger UI:
- -Let's check that our RateLimitPolicy
for the /store/inventory
has been applied and works correctly. Recall, our OAS spec had the following limits applied:
/store/inventory
API, click Try it out
, and Execute
.
-You'll see a response similar to:
- -This API has a rate limit applied, so if you send more than 10 requests in a 10 second window, you will see a 429
HTTP Status code from responses, and a "Too Many Requests" message in the response body. Click Execute
quickly in succession to see your RateLimitPolicy
in action.
Let's check that our AuthPolicy
for the /store/admin
endpoint has been applied and works correctly. Recall, our OAS spec had the following securitySchemes applied:
Navigate to the /store/admin
API, click Try it out
, and Execute
.
-You'll get a 401 response.
You can set a value for the api_key
header by clicking Authorize
at the top of the page. Set a value of secret
.
-This api key value is stored in the petstore-api-key
Secret in the petstore
namespace.
-Try the /store/admin
endpoint again and you should get a 200 response with the following:
Run the Swagger UI editor to explore the OAS spec and make some tweaks:
-docker run -p 8080:8080 -v $(pwd):/tmp -e SWAGGER_FILE=/tmp/openapi.yaml swaggerapi/swagger-editor
-
You should be able to access the Swagger Editor at http://localhost:8080.
-Our /store/inventory
API needs some additonal rate limiting. This is one of our slowest, most expensive services, so we'd like to rate limit it further.
In your openapi.yaml
, navigate to the /store/inventory
endpoint in the paths
block. Modify the rate_limit block to further restrict the amount of requests this endpoint can serve to 2 requests per 10 seconds:
Save your updated spec - File
> Save as YAML
> and update your existing openapi.yaml
. You may need to copy the file from your Downloads folder to the location of the petstore repository.
Next we'll re-generate our RateLimitPolicy
with kuadrantctl
:
# Generate this resource and save:
-kuadrantctl generate kuadrant ratelimitpolicy --oas openapi.yaml | yq -P | tee resources/ratelimitpolicy.yaml
-
-# Apply this resource to our cluster:
-kubectl --context kind-api-workload-1 apply -f ./resources/ratelimitpolicy.yaml
-
At this stage you can optionally check in all the changes to the repo if you forked it.
-# Optionally add, commit & push the changes to your fork
-git add resources
-git commit -am "Generated AuthPolicy,RateLimitPolicy & HTTPRoute"
-git push # You may need to set an upstream as well
-
In your app's Swagger UI:
- -Navigate to the /store/inventory
API one more, click Try it out
, and Execute
.
You'll see the effects of our new RateLimitPolicy
applied. If you now send more than 2 requests in a 10 second window, you'll be rate-limited.
Note: It may take a few minutes for the updated RateLimitPolicy to be configured with the modified rate limit.
-Deploy the petstore to the 2nd cluster:
-kustomize build ./resources/ | envsubst | kubectl --context kind-api-workload-2 apply -f-
-kubectl --context kind-api-workload-2 apply -f ./resources/authpolicy.yaml
-kubectl --context kind-api-workload-2 apply -f ./resources/ratelimitpolicy.yaml
-
Configure the app REGION
to be us
:
Deploy the Gateway to the 2nd cluster:
-kubectl --context kind-api-control-plane patch placement http-gateway --namespace multi-cluster-gateways --type='json' -p='[{"op": "replace", "path": "/spec/numberOfClusters", "value":2}]'
-
Label the 1st cluster as being in the 'EU' region, -and the 2nd cluster as being in the 'US' region. -These labels are used by the DNSPolicy for configuring geo DNS.
-kubectl --context kind-api-control-plane label managedcluster kind-api-workload-1 kuadrant.io/lb-attribute-geo-code=EU --overwrite
-kubectl --context kind-api-control-plane label managedcluster kind-api-workload-2 kuadrant.io/lb-attribute-geo-code=US --overwrite
-
Info
-This section is optional. If you'd rather skip this part, you can skip forward to the "(App developer) API traffic monitoring" section.
-python3
and pip3
: these are required for this part of the walkthroughTo demonstrate traffic management by geographical region, we'll use a tool called 'geosight'. This tool resolves hostnames from different regions, fetches a website from the resulting DNS record address and takes a screenshot. The petstore app has been configured to serve a flag image based on which region it is running in. In the 1st cluster, the EU flag is used. In the 2nd cluster, the US flag is used.
-To install 'geosight', run the following commands:
-git clone git@github.com:jasonmadigan/geosight.git && cd geosight
-pip3 install -r requirements.txt
-playwright install
-
Then run it using:
- -Access the webapp at http://127.0.0.1:5001/.
-In the input box, type the address from below and click the Fetch
button:
After a moment you should see dns results for different regions, and a corresponding screenshot.
-If you want to experiment with other regions, check out the Configuration section for geosight and the Kuadrant docs for geo loadbalancing.
-To view the App developer dashboard, the same Grafana will be used from the platform engineer steps above:
-https://grafana.172.31.0.2.nip.io
The most relevant for a app developer is Stitch: App Developer Dashboard
-You should see panels about API's including:
All corresponding to our HTTPRoute coming from our OAS spec
-Now that the app developer has deployed their app, new metrics and data is now available in the platform engineer dashboard seen in the previous step https://grafana.172.31.0.2.nip.io
:
You now have a local environment with a reference architecture to design and deploy an API in a kube native way, using Kuadrant and other open source tools.
-To destroy the previously created kind
clusters, run:
Info
-DNS records in AWS will remain after cleanup - you can remove these from your zone manually.
-Kuadrant provides connectivity, security and service protection capabilities in both a single and multi-cluster environment. It exposes these capabilities in the form of Kubernetes CRDs that implement the Gateway API concept of policy attachment. These policy APIs can target specific Gateway API resources such as Gateways
and HTTPRoutes
to extend their capabilities and configuration. They enable platform engineers to secure, protect and connect their infrastructure and allow application developers to self service and refine policies to their specific needs in order to protect exposed endpoints.
The control plane is a set of controllers and operators that are responsible for for installation and configuration of other components such as the data plane enforcement components and configuration of the Gateway to enable the data plane components to interact with incoming requests. The control plane also owns and reconciles the policy CRD APIs into more complex and specific configuration objects that the policy enforcement components consume in order to know the rules to apply to incoming requests or the configuration to apply to external integrations such as DNS and ACME providers.
- -RateLimitPolicy
, AuthPolicy
, DNSPolicy
and TLSPolicy
and reconciles these into enforceable configuration for the data plane.Kuadrant
and reconciles this to configure and trigger installation of the required data plane components and other control plane components.The data plane components sit in the request flow and are responsible for enforcing configuration defined by policy and providing service protection capabilities based on configuration managed and created by the control plane.
-AuthPolicy
API.In a single cluster, you have the Kuadrant control plane and data plane sitting together. It is configured to integrate with Gateways on the same cluster and configure a DNS zone via a DNS provider secret (configured alongside a DNSPolicy). Storage of rate limit counters is possible but not required as they are not being shared.
- -In the default multi-cluster setup. Each individual cluster has Kuadrant installed. Each of these clusters are unaware of the other. They are effectively operating as single clusters. The multi-cluster aspect is created by sharing access with the DNS zone, using a shared host across the clusters and leveraging shared counter storage. -The zone is operated on independently by each of DNS operator on both clusters to form a single cohesive record set. More details on this can be found in the following RFC document: TODO add link. -The rate limit counters can also be shared and used by different clusters in order to provide global rate limiting. This is achieved by connecting each instance of Limitador to a shared data store that uses the Redis protocol. There is another option available for achieving multi-cluster connectivity (see intgrations below) that requires the use of a "hub" cluster and integration with OCM (open cluster management).
- -Shown above is a multi-cluster multi ingress gateway topology. This might be used to support a geographically distributed system for example. However, it is also possible to leverage overlay networking tools such as Skupper that integrate at the Kubernetes service level to have a single gateway cluster that then integrates with multiple backends (on different clusters or in custom infrastructure).
-The Kuadrant architecture is intended to work with some popular monitoring tools for tracing, metrics and log aggregation. -Those tools are:
-Depending on the number of clusters in your configuration, you may decide to have a monitoring system on the same cluster as workloads, -or in a separate cluster completely. -Below are 2 example architectures based on the single cluster and multi cluster layouts. -In the single cluster architecture, the collector components (Prometheus, Vector and Tempo) are in the same cluster as the log aggregation (Loki) and visualisation component (Grafana).
- -In the multi cluster architecture, the collectors that scrape metrics or logs (Prometheus & Vector) are deployed alongside the workloads in each cluster. -However, as traces are sent to a collector (Tempo) from each component, it can be centralised in a separate cluster. -Thanos is used in this architecutre so that each prometheus can federate metrics back to a central location. -The log collector (vector) can forward logs to a central loki instance. -Finally, the visualisation component (Grafana) is centralised as well, with data sources configured for each of the 3 components on the same cluster.
- -RateLimitPolicy
and AuthPolicy
While the default setup is to leverage a distributed configuration for DNS and rate limiting. There is also a component that offers experimental integration with Open Cluster Management.
-In this setup, the OCM integration controller is installed into the HUB alongside the DNS Operator and the cert-manager. This integration allows you to define gateways in the Hub and distribute them to "spoke" clusters. The addresses of these gateways are gathered from the spokes and aggregated back to the hub. The Kuadrant operator and DNS operator then act on this information as though it were a single cluster gateway with multiple addresses. The DNS zone in the configured DNS provider is managed centrally by one DNS operator instance.
- - - - - - - - - - - - - - -It is important to note that Kuadrant is not in itself a gateway provider. Kuadrant provides a set of valuable policy APIs that enhance Gateway API via its defined policy attachment extension point. The policy APIs are reconciled by a set of policy controllers and enforced via integration at different points to configure, enhance and secure the application connectivity provided via Gateway API and the underlying gateway provider. -These policy extensions are focused around areas such as DNS management supporting global load balancing and health checks, alongside service protection specific APIs such as rate limiting and auth. Kuadrant also integrates with Open Cluster Management as a multi-cluster control plane to enable defining and distributing Gateways across multiple clusters, providing load balancing and tls management for these distributed gateways. These integrations and features can be managed centrally in a declarative way from the Open Cluster Management Hub using Kubernetes resources.
-A control plane component is something responsible for accepting instruction via a CRD based API and ensuring that configuration is manifested into state that can be acted on.
-RateLimitPolicy
and AuthPolicy
and is currently the policy controller for these APIsDNSPolicy
and TLSPolicy
PlacementDecision
and ManifestWork
.A data plane component sits in the request flow and is responsible for enforcing policy and providing service protection capabilities based on configuration managed and created by the control plane.
-In order to provide its full suite of functionality, Kuadrant has several dependencies. Some of these are optional depending on the functionality needed.
-RateLimitPolicy
and AuthPolicy
Kuadrant has a multi-cluster gateway controller that is intended to run in a Open Cluster Management provided "Hub" cluster. This cluster is effectively a central management cluster where policy and gateways along with all that Open Cluster Management offers can be defined and distributed to the managed "spoke" clusters.
-In a single cluster context, the overall architecture remains the same as above, the key difference is that the Hub and Spoke cluster are now a single cluster rather than multiple clusters. This is how we are initially supporting single cluster.
-Kuadrant deploys a multi-cluster gateway controller into the Open Cluster Management hub (a control plane that manages a set of "spoke" clusters where workloads are executed). This controller offers its own APIs but also integrates with hub CRD based APIs (such as the placement API) along with the Gateway API CRD based APIs in order to provide multi-cluster Gateway capabilities to the hub and distribute actual gateway instances to the spokes. See the Open Cluster Management docs for further details on the hub spoke architecture.
-As part of installing Kuadrant, the Gateway API CRDs are also installed into the hub cluster and Kuadrant defines a standard Gateway API GatewayClass
resource that the multi-cluster gateway controller is the chosen controller for.
Once installed, an Open Cluster Management user can then (with the correct RBAC in place) define in the standard way a Gateway resource that inherits from the Kuadrant configured GatewayClass
in the hub. There is nothing unique about this Gateway definition, the difference is what it represents and how it is used. This Gateway is used to represent a "multi-cluster" distributed gateway. As such there are no pods running behind this Gateway instance in the hub cluster, instead it serves as a template that the Kuadrant multi-cluster gateway controller reconciles and distributes to targeted spoke clusters. It leverages the Open Cluster Management APIs to distribute these gateways (more info below) and aggregates the status information from each spoke cluster instance of this gateway back to this central definition, in doing this it can represent the status of the gateway across multiple clusters but also use that information to integrate with DNS providers etc.
In order for a multi-cluster gateway to be truly useful, it needs to be distributed or "placed" on a specific set of hub managed spoke clusters. Open Cluster Management is responsible for a set of placement and replication APIs. Kuadrant is aware of these APIs, and so when a given gateway is chosen to be placed on a set of managed clusters, Kuadrant multi-cluster gateway controller will ensure the right resources (ManifestWork
) are created in the correct namespaces in the hub. Open Cluster Management then is responsible for syncing these to the actual spoke cluster and reporting back the status of these resources to the Hub. A user would indicate which clusters they want a gateway placed on by using a Placement
and then labeling the gateway using the cluster.open-cluster-management.io/placement
label.
In order for the Gateway to be instantiated, we need to know what underlying gateway provider is being used on the spoke clusters. Admins can then set this provider in the hub via the GatewayClass params. In the hub, Kuadrant will then apply a transformation to the gateway to ensure when synced it references this spoke gateway provider (Istio for example).
-It is the Open Cluster Management workagent that is responsible for syncing down and applying the resources into the managed spoke cluster. It is also responsible for syncing status information back to the hub. It is the multi-cluster gateway controller that is responsible for aggregating this status.
-The status information reported back to the Hub is used by the multi-cluster gateway controller to know what LB hosts / IPAddresses to use for DNSRecords that it creates and manages.
- -More info on the Open Cluster Management hub and spoke architecture can be found here
-Currently the Kuadrant data plane only integrates with an Istio based gateway provider:
-IstioOperator
as an auth provider so that Authorino can be used as an external auth provider.EnvoyFilter
to register the rate limiting service as an upstream service.AuthPolicy
, it leverages Istio's AuthorizationPolicy
resource to configure when a request should trigger Authorino to be called for a given host, path and method etc. RateLimitPolicy
resources. This binary is executed in response to a HTTP request being accepted by the gateway via the underlying Envoy instance that provides the proxy layer for the Gateway (IE Envoy). This plugin is configured with the correct upstream rate limit service name and when it sees a request, based on the provided configuration, it will trigger a call to the installed Limitador that is providing the rate limit capabilities and either allow the request to continue or trigger a response to the client with a 429 (too many requests) HTTP code.There are several different data flows when using Kuadrant.
-The initial creation of these APIs (gateways, policies etc) is done by the relevant persona in the control plane just as they would any other k8s resource. We use the term cluster admin or gateway admin as the operations type persona configuring, and placing gateways. -As shown above, in a multi-cluster configuration. API definitions are pulled from the Hub and "manifested" into the spokes. The Status of those synced resources are reported back to the Hub. The same happens for a single cluster, the only difference being the work agent hub controllers are all installed on one cluster.
-In order to enforce the policy configuration, components in the control plane and data plane can reach out to configured 3rd parties such as cloud based DNS provider, TLS providers and Auth providers.
-Requests coming through the gateway instance can be sent to Limitador based on configuration of the WASM plugin installed into the Envoy based gateway provider or to Authorino based configuration provided by the Istio AuthorizationPolicy
.
-Each of these components have the capability to see the request and need to in order to make the required decision. Each of these components can also prevent the request from reaching its intended backend destination based on user configuration.
As all of the APIs are CRDs, auth around creating these resources is handled in the standard way IE by the kubernetes cluster and RBAC. There is no relationship by default between the Auth features provided by Authorino to application developers and the auth requirements of the cluster API server.
-For Auth between Spoke and Hub see Open Cluster Management docs
-Kuadrant doesn't provide any specific observability components, but rather provides a reference setup using well known and established components along with some useful dashboards to help observe key things around the Gateways. The focus of this setup, is in the context of a multi-cluster setup where Open Cluster Management is installed and gateways are being defined and distributed from that hub.
- -This section is here to provide some insight into architectural changes that may be seen in the near future:
-What is in this doc represents the architecture at point our MVP release. Below are some areas that we have identified that are likely to change in the coming releases. As these happen, this doc will also evolve.
-Kuadrant is developing a set of loosely coupled functionalities built directly on top of Kubernetes. -Kuadrant aims to allow customers to just install, use and understand those functionalities they need.
-Currently, the installation tool of kuadrant, the kuadrantctl CLI, -installs all or nothing. Installing more than the customer needs adds unneeded complexity and operational effort. -For example, if a customer is looking for rate limiting and not interested in authentication functionality, -then the customer should be able to just install and run that part of Kuadrant.
-Reduce system complexity and operational effort to the minimum required. -Components in this context make reference to deployments and running instances.
-A user of a partial Kuadrant install should not be confronted with data in custom resources that -has no meaning or is not accessible in their partial Kuadrant install. The design of the kuadrant -API should have this goal into account.
-The kuadrant installation mechanism should offer modular installation to enable/disable loosely coupled pieces of kuadrant. -Modular installation options should be feature oriented rather than deployment component oriented. -Then, it is up to the installation tool to decide what components need to be deployed and how to -configure it.
-Each feature, or part of it, is eligible to be included or excluded when installing kuadrant.
-Some profiles can be defined to group set of commonly required features. Naming the profiles -allows the customer to easily express wanted installation configuration. Furthermore, profiles -not only can be used to group a set of features, profiles can be used to define deployment options.
-Name | -Description | -
---|---|
Minimal | -Minimal installation required to run an API without any protection, analytics or API management. Default deployment option | -
AuthZ | -Authentication and authorization mechanisms activated | -
RateLimit | -Basic rate limit (only pre-auth rate limit) features | -
Full | -Full featured kuadrant installation | -
A kuadrant operator, together with a design of a kuadrant CRD is desired. -Not only for kuadrant installation, but also for lifecycle management. -Additionally, the kuadrantctl CLI tool can also -be useful to either deploy kuadrant components and manifests or just deploy the kuadrant operator.
-The kuadrant control plane should be aware of the installed profile via env vars or command line params -in the control plane running components. With that information, the control plane can decide to -enable or disable CRD watching, label and annotation monitoring and ultimately reject any configuration -object that relies on disabled functionality. The least a customer can expect from kuadrant is to be -consistent and reject any functionality request that cannot provide.
- - - - - - - - - - - - - -rlp-v2
Proposal of new API for the Kuadrant's RateLimitPolicy
(RLP) CRD, for improved UX.
The RateLimitPolicy
API (v1beta1), particularly its RateLimit
type used in ratelimitpolicy.spec.rateLimits
, designed in part to fit the underlying implementation based on the Envoy Rate limit filter, has been proven to be complex, as well as somewhat limiting for the extension of the API for other platforms and/or for supporting use cases of not contemplated in the original design.
Users of the RateLimitPolicy
will immediately recognize elements of Envoy's Rate limit API in the definitions of the RateLimit
type, with almost 1:1 correspondence between the Configuration
type and its counterpart in the Envoy configuration. Although compatibility between those continue to be desired, leaking such implementation details to the level of the API can be avoided to provide a better abstraction for activators ("matchers") and payload ("descriptors"), stated by users in a seamless way.
Furthermore, the Limit
type – used as well in the RLP's RateLimit
type – implies presently a logical relationship between its inner concepts – i.e. conditions and variables on one side, and limits themselves on the other – that otherwise could be shaped in a different manner, to provide clearer understanding of the meaning of these concepts by the user and avoid repetition. I.e., one limit definition contains multiple rate limits, and not the other way around.
skip_if_absent
for the RequestHeaders action (kuadrant/wasm-shim#29)spec.rateLimits[]
replaced with spec.limits{<limit-name>: <limit-definition>}
spec.rateLimits.limits
replaced with spec.limits.<limit-name>.rates
spec.rateLimits.limits.maxValue
replaced with spec.limits.<limit-name>.rates.limit
spec.rateLimits.limits.seconds
replaced with spec.limits.<limit-name>.rates.duration
+ spec.limits.<limit-name>.rates.unit
spec.rateLimits.limits.conditions
replaced with spec.limits.<limit-name>.when
, structured field based on well-known selectors, mainly for expressing conditions not related to the HTTP route (although not exclusively)spec.rateLimits.limits.variables
replaced with spec.limits.<limit-name>.counters
, based on well-known selectorsspec.rateLimits.rules
replaced with spec.limits.<limit-name>.routeSelectors
, for selecting (or "sub-targeting") HTTPRouteRules that trigger the limitspec.limits.<limit-name>.routeSelectors.hostnames[]
spec.rateLimits.configurations
removed – descriptor actions configuration (previously spec.rateLimits.configurations.actions
) generated from spec.limits.<limit-name>.when.selector
∪ spec.limits.<limit-name>.counters
and unique identifier of the limit (associated with spec.limits.<limit-name>.routeSelectors
)spec.limits.<limit-name>.when
conditions + a "hard" condition that binds the limit to its trigger HTTPRouteRulesFor detailed differences between current and new RLP API, see Comparison to current RateLimitPolicy.
-Given the following network resources:
-apiVersion: gateway.networking.k8s.io/v1alpha2
-kind: Gateway
-metadata:
- name: istio-ingressgateway
- namespace: istio-system
-spec:
- gatewayClassName: istio
- listeners:
-
- - hostname:
- - "*.acme.com"
----
-apiVersion: gateway.networking.k8s.io/v1alpha2
-kind: HTTPRoute
-metadata:
- name: toystore
- namespace: toystore
-spec:
- parentRefs:
- - name: istio-ingressgateway
- namespace: istio-system
- hostnames:
- - "*.toystore.acme.com"
- rules:
- - matches:
- - path:
- type: PathPrefix
- value: "/toys"
- method: GET
- - path:
- type: PathPrefix
- value: "/toys"
- method: POST
- backendRefs:
- - name: toystore
- port: 80
- - matches:
- - path:
- type: PathPrefix
- value: "/assets/"
- backendRefs:
- - name: toystore
- port: 80
- filters:
- - type: ResponseHeaderModifier
- responseHeaderModifier:
- set:
- - name: Cache-Control
- value: "max-age=31536000, immutable"
-
The following are examples of RLPs targeting the route and the gateway. Each example is independent from the other.
-In this example, all traffic to *.toystore.acme.com
will be limited to 5rps, regardless of any other attribute of the HTTP request (method, path, headers, etc), without any extra "soft" conditions (conditions non-related to the HTTP route), across all consumers of the API (unqualified rate limiting).
apiVersion: kuadrant.io/v2beta1
-kind: RateLimitPolicy
-metadata:
- name: toystore-infra-rl
- namespace: toystore
-spec:
- targetRef:
- group: gateway.networking.k8s.io
- kind: HTTPRoute
- name: toystore
- limits:
- base: # user-defined name of the limit definition - future use for handling hierarchical policy attachment
-
- - rates: # at least one rate limit required
- - limit: 5
- unit: second
-
gateway_actions:
-
-- rules:
- - paths: ["/toys*"]
- methods: ["GET"]
- hosts: ["*.toystore.acme.com"]
- - paths: ["/toys*"]
- methods: ["POST"]
- hosts: ["*.toystore.acme.com"]
- - paths: ["/assets/*"]
- hosts: ["*.toystore.acme.com"]
- configurations:
- - generic_key:
- descriptor_key: "toystore/toystore-infra-rl/base"
- descriptor_value: "1"
-
In this example, a distinct limit will be associated ("bound") to each individual HTTPRouteRule of the targeted HTTPRoute, by using the routeSelectors
field for selecting (or "sub-targeting") the HTTPRouteRule.
The following limit definitions will be bound to each HTTPRouteRule:
-/toys*
→ 50rpm, enforced per username (counter qualifier) and only in case the user is not an admin ("soft" condition)./assets/*
→ 5rpm / 100rp12hEach set of trigger matches in the RLP will be matched to all HTTPRouteRules whose HTTPRouteMatches is a superset of the set of trigger matches in the RLP. For every HTTPRouteRule matched, the HTTPRouteRule will be bound to the corresponding limit definition that specifies that trigger. In case no HTTPRouteRule is found containing at least one HTTPRouteMatch that is identical to some set of matching rules of a particular limit definition, the limit definition is considered invalid and reported as such in the status of RLP.
-apiVersion: kuadrant.io/v2beta1
-kind: RateLimitPolicy
-metadata:
- name: toystore-per-endpoint
- namespace: toystore
-spec:
- targetRef:
- group: gateway.networking.k8s.io
- kind: HTTPRoute
- name: toystore
- limits:
- toys:
- rates:
-
- - limit: 50
- duration: 1
- unit: minute
- counters:
- - auth.identity.username
- routeSelectors:
- - matches: # matches the 1st HTTPRouteRule (i.e. GET or POST to /toys*)
- - path:
- type: PathPrefix
- value: "/toys"
- when:
- - selector: auth.identity.group
- operator: neq
- value: admin
-
- assets:
- rates:
-
- - limit: 5
- duration: 1
- unit: minute
- - limit: 100
- duration: 12
- unit: hour
- routeSelectors:
- - matches: # matches the 2nd HTTPRouteRule (i.e. /assets/*)
- - path:
- type: PathPrefix
- value: "/assets/"
-
gateway_actions:
-
-- rules:
- - paths: ["/toys*"]
- methods: ["GET"]
- hosts: ["*.toystore.acme.com"]
- - paths: ["/toys*"]
- methods: ["POST"]
- hosts: ["*.toystore.acme.com"]
- configurations:
- - generic_key:
- descriptor_key: "toystore/toystore-per-endpoint/toys"
- descriptor_value: "1"
- - metadata:
- descriptor_key: "auth.identity.group"
- metadata_key:
- key: "envoy.filters.http.ext_authz"
- path:
- - segment:
- key: "identity"
- - segment:
- key: "group"
- - metadata:
- descriptor_key: "auth.identity.username"
- metadata_key:
- key: "envoy.filters.http.ext_authz"
- path:
- - segment:
- key: "identity"
- - segment:
- key: "username"
-- rules:
- - paths: ["/assets/*"]
- hosts: ["*.toystore.acme.com"]
- configurations:
- - generic_key:
- descriptor_key: "toystore/toystore-per-endpoint/assets"
- descriptor_value: "1"
-
limits:
-
-- conditions:
- - toystore/toystore-per-endpoint/toys == "1"
- - auth.identity.group != "admin"
- variables:
- - auth.identity.username
- max_value: 50
- seconds: 60
- namespace: kuadrant
-- conditions:
- - toystore/toystore-per-endpoint/assets == "1"
- max_value: 5
- seconds: 60
- namespace: kuadrant
-- conditions:
- - toystore/toystore-per-endpoint/assets == "1"
- max_value: 100
- seconds: 43200 # 12 hours
- namespace: kuadrant
-
Consider a 150rps rate limit set on requests to GET /toys/special
. Such specific application endpoint is covered by the first HTTPRouteRule in the HTTPRoute (as a subset of GET
or POST
to any path that starts with /toys
). However, to avoid binding limits to HTTPRouteRules that are more permissive than the actual intended scope of the limit, the RateLimitPolicy controller requires trigger matches to find identical matching rules explicitly defined amongst the sets of HTTPRouteMatches of the HTTPRouteRules potentially targeted.
As a consequence, by simply defining a trigger match for GET /toys/special
in the RLP, the GET|POST /toys*
HTTPRouteRule will NOT be bound to the limit definition. In order to ensure the limit definition is properly bound to a routing rule that strictly covers the GET /toys/special
application endpoint, first the user has to modify the spec of the HTTPRoute by adding an explicit HTTPRouteRule for this case:
apiVersion: gateway.networking.k8s.io/v1alpha2
-kind: HTTPRoute
-metadata:
- name: toystore
- namespace: toystore
-spec:
- parentRefs:
-
- - name: istio-ingressgateway
- namespace: istio-system
- hostnames:
- - "*.toystore.acme.com"
- rules:
- - matches:
- - path:
- type: PathPrefix
- value: "/toys"
- method: GET
- - path:
- type: PathPrefix
- value: "/toys"
- method: POST
- backendRefs:
- - name: toystore
- port: 80
- - matches:
- - path:
- type: PathPrefix
- value: "/assets/"
- backendRefs:
- - name: toystore
- port: 80
- filters:
- - type: ResponseHeaderModifier
- responseHeaderModifier:
- set:
- - name: Cache-Control
- value: "max-age=31536000, immutable"
- - matches: # new (more specific) HTTPRouteRule added
- - path:
- type: Exact
- value: "/toys/special"
- method: GET
- backendRefs:
- - name: toystore
- port: 80
-
After that, the RLP can target the new HTTPRouteRule strictly:
-apiVersion: kuadrant.io/v2beta1
-kind: RateLimitPolicy
-metadata:
- name: toystore-special-toys
- namespace: toystore
-spec:
- targetRef:
- group: gateway.networking.k8s.io
- kind: HTTPRoute
- name: toystore
- limits:
- specialToys:
- rates:
-
- - limit: 150
- unit: second
- routeSelectors:
- - matches: # matches the new HTTPRouteRule (i.e. GET /toys/special)
- - path:
- type: Exact
- value: "/toys/special"
- method: GET
-
This example is similar to Example 3. Consider the use case of setting a 150rpm rate limit on requests to GET /toys*
.
The targeted application endpoint is covered by the first HTTPRouteRule in the HTTPRoute (as a subset of GET
or POST
to any path that starts with /toys
). However, unlike in the previous example where, at first, no HTTPRouteRule included an explicit HTTPRouteMatch for GET /toys/special
, in this example the HTTPRouteMatch for the targeted application endpoint GET /toys*
does exist explicitly in one of the HTTPRouteRules, thus the RateLimitPolicy controller would find no problem to bind the limit definition to the HTTPRouteRule. That would nonetheless cause a unexpected behavior of the limit triggered not strictly for GET /toys*
, but also for POST /toys*
.
To avoid extending the scope of the limit beyond desired, with no extra "soft" conditions, again the user must modify the spec of the HTTPRoute, so an exclusive HTTPRouteRule exists for the GET /toys*
application endpoint:
apiVersion: gateway.networking.k8s.io/v1alpha2
-kind: HTTPRoute
-metadata:
- name: toystore
- namespace: toystore
-spec:
- parentRefs:
-
- - name: istio-ingressgateway
- namespace: istio-system
- hostnames:
- - "*.toystore.acme.com"
- rules:
- - matches: # first HTTPRouteRule split into two – one for GET /toys*, other for POST /toys*
- - path:
- type: PathPrefix
- value: "/toys"
- method: GET
- backendRefs:
- - name: toystore
- port: 80
- - matches:
- - path:
- type: PathPrefix
- value: "/toys"
- method: POST
- backendRefs:
- - name: toystore
- port: 80
- - matches:
- - path:
- type: PathPrefix
- value: "/assets/"
- backendRefs:
- - name: toystore
- port: 80
- filters:
- - type: ResponseHeaderModifier
- responseHeaderModifier:
- set:
- - name: Cache-Control
- value: "max-age=31536000, immutable"
-
The RLP can then target the new HTTPRouteRule strictly:
-apiVersion: kuadrant.io/v2beta1
-kind: RateLimitPolicy
-metadata:
- name: toy-readers
- namespace: toystore
-spec:
- targetRef:
- group: gateway.networking.k8s.io
- kind: HTTPRoute
- name: toystore
- limits:
- toyReaders:
- rates:
-
- - limit: 150
- unit: second
- routeSelectors:
- - matches: # matches the new more specific HTTPRouteRule (i.e. GET /toys*)
- - path:
- type: PathPrefix
- value: "/toys"
- method: GET
-
In this example, both HTTPRouteRules, i.e. GET|POST /toys*
and /assets/*
, are targeted by the same limit of 50rpm per username.
Because the HTTPRoute has no other rule, this is technically equivalent to targeting the entire HTTPRoute and therefore similar to Example 1. However, if the HTTPRoute had other rules or got other rules added afterwards, this would ensure the limit applies only to the two original route rules.
-apiVersion: kuadrant.io/v2beta1
-kind: RateLimitPolicy
-metadata:
- name: toystore-per-user
- namespace: toystore
-spec:
- targetRef:
- group: gateway.networking.k8s.io
- kind: HTTPRoute
- name: toystore
- limits:
- toysOrAssetsPerUsername:
- rates:
-
- - limit: 50
- duration: 1
- unit: minute
- counters:
- - auth.identity.username
- routeSelectors:
- - matches:
- - path:
- type: PathPrefix
- value: "/toys"
- method: GET
- - path:
- type: PathPrefix
- value: "/toys"
- method: POST
- - matches:
- - path:
- type: PathPrefix
- value: "/assets/"
-
gateway_actions:
-
-- rules:
- - paths: ["/toys*"]
- methods: ["GET"]
- hosts: ["*.toystore.acme.com"]
- - paths: ["/toys*"]
- methods: ["POST"]
- hosts: ["*.toystore.acme.com"]
- - paths: ["/assets/*"]
- hosts: ["*.toystore.acme.com"]
- configurations:
- - generic_key:
- descriptor_key: "toystore/toystore-per-user/toysOrAssetsPerUsername"
- descriptor_value: "1"
- - metadata:
- descriptor_key: "auth.identity.username"
- metadata_key:
- key: "envoy.filters.http.ext_authz"
- path:
- - segment:
- key: "identity"
- - segment:
- key: "username"
-
In case multiple limit definitions target a same HTTPRouteRule, all those limit definitions will be bound to the HTTPRouteRule. No limit "shadowing" will be be enforced by the RLP controller. Due to how things work as of today in Limitador nonetheless (i.e. the rule of the most restrictive limit wins), in some cases, across multiple limits triggered, one limit ends up "shadowing" others, depending on further qualification of the counters and the actual RL values.
-E.g., the following RLP intends to set 50rps per username on GET /toys*
, and 100rps on POST /toys*
or /assets/*
:
apiVersion: kuadrant.io/v2beta1
-kind: RateLimitPolicy
-metadata:
- name: toystore-per-endpoint
- namespace: toystore
-spec:
- targetRef:
- group: gateway.networking.k8s.io
- kind: HTTPRoute
- name: toystore
- limits:
- readToys:
- rates:
-
- - limit: 50
- unit: second
- counters:
- - auth.identity.username
- routeSelectors:
- - matches: # matches the 1st HTTPRouteRule (i.e. GET or POST to /toys*)
- - path:
- type: PathPrefix
- value: "/toys"
- method: GET
-
- postToysOrAssets:
- rates:
-
- - limit: 100
- unit: second
- routeSelectors:
- - matches: # matches the 1st HTTPRouteRule (i.e. GET or POST to /toys*)
- - path:
- type: PathPrefix
- value: "/toys"
- method: POST
- - matches: # matches the 2nd HTTPRouteRule (i.e. /assets/*)
- - path:
- type: PathPrefix
- value: "/assets/"
-
gateway_actions:
-
-- rules:
- - paths: ["/toys*"]
- methods: ["GET"]
- hosts: ["*.toystore.acme.com"]
- - paths: ["/toys*"]
- methods: ["POST"]
- hosts: ["*.toystore.acme.com"]
- configurations:
- - generic_key:
- descriptor_key: "toystore/toystore-per-endpoint/readToys"
- descriptor_value: "1"
- - metadata:
- descriptor_key: "auth.identity.username"
- metadata_key:
- key: "envoy.filters.http.ext_authz"
- path:
- - segment:
- key: "identity"
- - segment:
- key: "username"
-- rules:
- - paths: ["/toys*"]
- methods: ["GET"]
- hosts: ["*.toystore.acme.com"]
- - paths: ["/toys*"]
- methods: ["POST"]
- hosts: ["*.toystore.acme.com"]
- - paths: ["/assets/*"]
- hosts: ["*.toystore.acme.com"]
- configurations:
- - generic_key:
- descriptor_key: "toystore/toystore-per-endpoint/readToys"
- descriptor_value: "1"
- - generic_key:
- descriptor_key: "toystore/toystore-per-endpoint/postToysOrAssets"
- descriptor_value: "1"
-
limits:
-
-- conditions: # actually applies to GET|POST /toys*
- - toystore/toystore-per-endpoint/readToys == "1"
- variables:
- - auth.identity.username
- max_value: 50
- seconds: 1
- namespace: kuadrant
-- conditions: # actually applies to GET|POST /toys* and /assets/*
- - toystore/toystore-per-endpoint/postToysOrAssets == "1"
- max_value: 100
- seconds: 1
- namespace: kuadrant
-
This example was only written in this way to highlight that it is possible that multiple limit definitions select a same HTTPRouteRule. To avoid over-limiting between GET|POST /toys*
and thus ensure the originally intended limit definitions for each of these routes apply, the HTTPRouteRule should be split into two, like done in Example 4.
In the previous examples, the limit definitions and therefore the counters were set indistinctly for all hostnames – i.e. no matter if the request is sent to games.toystore.acme.com
or dolls.toystore.acme.com
, the same counters are expected to be affected. In this example on the other hand, a 1000rpd rate limit is set for requests to /assets/*
only when the hostname matches games.toystore.acme.com
.
First, the user needs to edit the HTTPRoute to make the targeted hostname games.toystore.acme.com
explicit:
apiVersion: gateway.networking.k8s.io/v1alpha2
-kind: HTTPRoute
-metadata:
- name: toystore
- namespace: toystore
-spec:
- parentRefs:
-
- - name: istio-ingressgateway
- namespace: istio-system
- hostnames:
- - "*.toystore.acme.com"
- - games.toystore.acme.com # new (more specific) hostname added
- rules:
- - matches:
- - path:
- type: PathPrefix
- value: "/toys"
- method: GET
- - path:
- type: PathPrefix
- value: "/toys"
- method: POST
- backendRefs:
- - name: toystore
- port: 80
- - matches:
- - path:
- type: PathPrefix
- value: "/assets/"
- backendRefs:
- - name: toystore
- port: 80
- filters:
- - type: ResponseHeaderModifier
- responseHeaderModifier:
- set:
- - name: Cache-Control
- value: "max-age=31536000, immutable"
-
After that, the RLP can target specifically the newly added hostname:
-apiVersion: kuadrant.io/v2beta1
-kind: RateLimitPolicy
-metadata:
- name: toystore-per-hostname
- namespace: toystore
-spec:
- targetRef:
- group: gateway.networking.k8s.io
- kind: HTTPRoute
- name: toystore
- limits:
- games:
- rates:
-
- - limit: 1000
- unit: day
- routeSelectors:
- - matches:
- - path:
- type: PathPrefix
- value: "/assets/"
- hostnames:
- - games.toystore.acme.com
-
--Note: Additional meaning and context may be given to this use case in the future, when discussing defaults and overrides.
-
Targeting a Gateway is a shortcut to targeting all individual HTTPRoutes referencing the gateway as parent. This differs from Example 1 nonetheless because, by targeting the gateway rather than an individual HTTPRoute, the RLP applies automatically to all HTTPRoutes pointing to the gateway, including routes created before and after the creation of the RLP. Moreover, all those routes will share the same limit counters specified in the RLP.
-apiVersion: kuadrant.io/v2beta1
-kind: RateLimitPolicy
-metadata:
- name: gw-rl
- namespace: istio-ingressgateway
-spec:
- targetRef:
- group: gateway.networking.k8s.io
- kind: Gateway
- name: istio-ingressgateway
- limits:
- base:
-
- - rates:
- - limit: 5
- unit: second
-
gateway_actions:
-
-- rules:
- - paths: ["/toys*"]
- methods: ["GET"]
- hosts: ["*.toystore.acme.com"]
- - paths: ["/toys*"]
- methods: ["POST"]
- hosts: ["*.toystore.acme.com"]
- - paths: ["/assets/*"]
- hosts: ["*.toystore.acme.com"]
- configurations:
- - generic_key:
- descriptor_key: "istio-system/gw-rl/base"
- descriptor_value: "1"
-
Current | -New | -Reason | -
---|---|---|
1:1 relation between Limit (the object) and the actual Rate limit (the value) (spec.rateLimits.limits ) |
- Rate limit becomes a detail of Limit where each limit may define one or more rates (1:N) (spec.limits.<limit-name>.rates ) |
-
-
|
-
Parsed spec.rateLimits.limits.conditions field, directly exposing the Limitador's API |
- Structured spec.limits.<limit-name>.when condition field composed of 3 well-defined properties: selector , operator and value |
-
-
|
-
spec.rateLimits.configurations as a list of "variables assignments" and direct exposure of Envoy's RL descriptor actions API |
- Descriptor actions composed from selectors used in the limit definitions (spec.limits.<limit-name>.when.selector and spec.limits.<limit-name>.counters ) plus a fixed identifier of the route rules (spec.limits.<limit-name>.routeSelectors ) |
-
-
|
-
Key-value descriptors | -Structured descriptors referring to a contextual well-known data structure | -
-
|
-
Limitador conditions independent from the route rules | -Artificial Limitador condition injected to bind routes and corresponding limits | -
-
|
-
translate(spec.rateLimits.rules) ⊂ httproute.spec.rules |
- spec.limits.<limit-name>.routeSelectors.matches ⊆ httproute.spec.rules.matches |
-
-
|
-
spec.rateLimits.limits.seconds |
- spec.limits.<limit-name>.rates.duration and spec.limits.<limit-name>.rates.unit |
-
-
|
-
spec.rateLimits.limits.variables |
- spec.limits.<limit-name>.counters |
-
-
|
-
spec.rateLimits.limits.maxValue |
- spec.limits.<limit-name>.rates.limit |
-
-
|
-
By completely dropping out the configurations
field from the RLP, composing the RL descriptor actions is now done based essentially on the selectors listed in the when
conditions and the counters
, plus an artificial condition used to bind the HTTPRouteRules to the corresponding limits to trigger in Limitador.
The descriptor actions composed from the selectors in the "soft" when
conditions and counter qualifiers originate from the direct references these selectors make to paths within a well-known data structure that stores information about the context (HTTP request and ext-authz filter). These selectors in "soft" when
conditions and counter qualifiers are thereby called well-known selectors.
Other descriptor actions might be composed by the RLP controller to define additional RL conditions to bind HTTPRouteRules and corresponding limits.
-Each selector used in a when
condition or counter qualifier is a direct reference to a path within a well-known data structure that stores information about the context
(L4 and L7 data of the original request handled by the proxy), as well as auth
data (dynamic metadata occasionally exported by the external authorization filter and injected by the proxy into the rate-limit filter).
The well-known data structure for building RL descriptor actions resembles Authorino's "Authorization JSON", whose context
component consists of Envoy's AttributeContext
type of the external authorization API (marshalled as JSON). Compared to the more generic RateLimitRequest
struct, the AttributeContext
provides a more structured and arguibly more intuitive relation between the data sources for the RL descriptors actions and their corresponding key names through which the values are referred within the RLP, in a context of predominantly serving for HTTP applications.
To keep compatibility with the Envoy Rate Limit API, the well-known data structure can optionally be extended with the RateLimitRequest
, thus resulting in the following final structure.
context: # Envoy's Ext-Authz `CheckRequest.AttributeContext` type
- source:
- address: …
- service: …
- …
- destination:
- address: …
- service: …
- …
- request:
- http:
- host: …
- path: …
- method: …
- headers: {…}
-
-auth: # Dynamic metadata exported by the external authorization service
-
-ratelimit: # Envoy's Rate Limit `RateLimitRequest` type
- domain: … # generated by the Kuadrant controller
- descriptors: {…} # descriptors configured by the user directly in the proxy (not generated by the Kuadrant controller, if allowed)
- hitsAddend: … # only in case we want to allow users to refer to this value in a policy
-
From the perspective of a user who writes a RLP, the selectors used in then when
and counters
fields are paths to the well-known data structure (see Well-known selectors). While desiging a policy, the user intuitively pictures the well-known data structure and states each limit definition having in mind the possible values assumed by each of those paths in the data plane. For example,
The user story:
---Each distinct user (
-auth.identity.username
) can send no more than 1rps to the same HTTP path (context.request.http.path
).
...materializes as the following RLP:
-apiVersion: kuadrant.io/v2beta1
-kind: RateLimitPolicy
-metadata:
- name: toystore
-spec:
- targetRef:
- group: gateway.networking.k8s.io
- kind: HTTPRoute
- name: toystore
- limits:
- dolls:
- rates:
-
- - limit: 1
- unit: second
- counters:
- - auth.identity.username
- - context.request.http.path
-
The following selectors are to be interpreted by the RLP controller:
-auth.identity.username
context.request.http.path
The RLP controller uses a map to translate each selector into its corresponding descriptor action. (Roughly described:)
-context.source.address → source_cluster(...) # TBC
-context.source.service → source_cluster(...) # TBC
-context.destination... → destination_cluster(...)
-context.destination... → destination_cluster(...)
-context.request.http.<X> → request_headers(header_name: ":<X>")
-context.request... → ...
-auth.<X> → metadata(key: "envoy.filters.http.ext_authz", path: <X>)
-ratelimit.domain → <hostname>
-
...to yield effectively:
-rate_limits:
-
-- actions:
- - metadata:
- descriptor_key: "auth.identity.username"
- metadata_key:
- key: "envoy.filters.http.ext_authz"
- path:
- - segment:
- key: "identity"
- - segment:
- key: "username"
- - request_headers:
- descriptor_key: "context.request.http.path"
- header_name: ":path"
-
routeSelectors
¶For each limit definition that explicitly or implicitly defines a routeSelectors
field, the RLP controller will generate an artificial Limitador condition that ensures that the limit applies only when the filterred rules are honoured when serving the request. This can be implemented with a 2-step procedure:
<policy-namespace>/<policy-name>/<limit-name>
generic_key
type descriptor action with each HTTPRouteRule
targeted by the limit – i.e. { descriptor_key: <unique identifier of the limit>, descriptor_value: "1" }
.For example, given the following RLP:
-apiVersion: kuadrant.io/v2beta1
-kind: RateLimitPolicy
-metadata:
- name: toystore-non-admin-users
- namespace: toystore
-spec:
- targetRef:
- group: gateway.networking.k8s.io
- kind: HTTPRoute
- name: toystore
- limits:
- toys:
- routeSelectors:
-
- - matches:
- - path:
- type: PathPrefix
- value: "/toys"
- method: GET
- - path:
- type: PathPrefix
- value: "/toys"
- method: POST
- rates:
- - limit: 50
- duration: 1
- unit: minute
- when:
- - selector: auth.identity.group
- operator: neq
- value: admin
-
- assets:
- routeSelectors:
-
- - matches:
- - path:
- type: PathPrefix
- value: "/assets/"
- rates:
- - limit: 5
- duration: 1
- unit: minute
- when:
- - selector: auth.identity.group
- operator: neq
- value: admin
-
Apart from the following descriptor action associated with both routes:
-- metadata:
- descriptor_key: "auth.identity.group"
- metadata_key:
- key: "envoy.filters.http.ext_authz"
- path:
- - segment:
- key: "identity"
- - segment:
- key: "group"
-
...and its corresponding Limitador condition:
- -The following additional artificial descriptor actions will be generated:
-# associated with route rule GET|POST /toys*
-
-- generic_key:
- descriptor_key: "toystore/toystore-non-admin-users/toys"
- descriptor_value: "1"
-
-# associated with route rule /assets/*
-
-- generic_key:
- descriptor_key: "toystore/toystore-non-admin-users/assets"
- descriptor_value: "1"
-
...and their corresponding Limitador conditions.
-In the end, the following Limitador configuration is yielded:
-- conditions:
- - toystore/toystore-non-admin-users/toys == "1"
- - auth.identity.group != "admin"
- max_value: 50
- seconds: 60
- namespace: kuadrant
-
-
-- conditions:
- - toystore/toystore-non-admin-users/assets == "1"
- - auth.identity.group != "admin"
- max_value: 5
- seconds: 60
- namespace: kuadrant
-
This proposal tries to keep compatibility with the Envoy API for rate limit and does not introduce any new requirement that otherwise would require the use of wasm shim to be implemented.
-In the case of implementation of this proposal in the wasm shim, all types of matchers supported by the HTTPRouteMatch type of Gateway API must be also supported in the rate_limit_policies.gateway_actions.rules
field of the wasm plugin configuration. These include matchers based on path (prefix, exact), headers, query string parameters and method.
HTTPRoute editing occasionally required
-Need to duplicate rules that don't explicitly include a matcher wanted for the policy, so that matcher can be added as a special case for each of those rules.
Risk of over-targeting
-Some HTTPRouteRules might need to be split into more specific ones so a limit definition is not bound to beyond intended (e.g. target method: GET
when the route matches method: POST|GET
).
Prone to consistency issues
-Typos and updates to the HTTPRoute can easily cause a mismatch and invalidate a RLP.
Two types of conditions – routeSelectors
and when
conditions
-Although with different meanings (evaluates in the gateway vs. evaluated in Limitador) and meant for expressing different types of rules (HTTPRouteRule selectors vs. "soft" conditions based on attributes not related to the HTTP request), users might still perceive these as two ways of expressing conditions and find difficult to understand at first that "soft" conditions do not accept expressions related to attributes of the HTTP request.
Requiring users to specify full HTTPRouteRule matches in the RLP (as opposed to any subset of HTTPRoureMatches of targeted HTTPRouteRules – current proposal) contains some of the same drawbacks of this proposal, such as HTTPRoute editing occasionally required and prone to consistency issues. If, on one hand, it eliminates the risk of over-targeting, on the other hand, it does it at the cost of requiring excessively verbose policies written by the users, to the point of sometimes expecting user to have to specify trigger matching rules that are significantly more than what's originally and strictly intended.
-E.g.:
-On a HTTPRoute that contains the following HTTPRouteRules (simplified representation):
- -Where the user wants to define a RLP that targets { method: POST }
. First, the user needs to edit the HTTPRoute and duplicate the HTTPRouteRules:
{ header: x-canary=true, method: POST } → backend-canary
-{ header: x-canary=true } → backend-canary
-{ method: POST } → backend-rest
-{ * } → backend-rest
-
Then, user needs to include the following trigger in the RLP so only full HTTPRouteRules are specified:
- -The first matching rule of the trigger (i.e. { header: x-canary=true, method: POST }
) is beoynd the original user intent of targeting simply { method: POST }
.
This issue can be even more concerning in the case of targeting gateways with multiple child HTTPRoutes. All the HTTPRoutes would have to be fixed and the HTTPRouteRules that cover for all the cases in all HTTPRoutes listed in the policy targeting the gateway.
-The proposed binding between limit definition and HTTPRouteRules that trigger the limits was thought so multiple limit definitions can be bound to a same HTTPRouteRule that triggers those limits in Limitador. That means that no limit definition will "shadow" another at the level of the RLP controller, i.e. the RLP controller will honour the intended binding according to the selectors specified in the policy.
-Due to how things work as of today in Limitador nonetheless, i.e., the rule of the most restrictive limit wins, and because all limit definitions triggered by a given shared HTTPRouteRule, it might be the case that, across multiple limits triggered, one limit ends up "shadowing" other limits. However, that is by implementation of Limitador and therefore beyond the scope of the API.
-An alternative to the approach of allowing all limit definitions to be bound to a same selected HTTPRouteRules would be enforcing that, amongst multiple limit definitions targeting a same HTTPRouteRule, only the first of those limits definitions is bound to the HTTPRouteRule. This alternative approach effectively would cause the first limit to "shadow" any other on that particular HTTPRouteRule, as by implementation of the RLP controller (i.e., at API level).
-While the first approach causes an artificial Limitador condition of the form <policy-ns>/<policy-name>/<limit-name> == "1"
, the alternative approach ("limit shadowing") could be implemented by generating a descriptor of the following form instead: ratelimit.binding == "<policy-ns>/<policy-name>/<limit-name>"
.
The downside of allowing multiple bindings to the same HTTPRouteRule is that all limits apply in Limitador, thus making status report frequently harder. The most restritive rate limit strategy implemented by Limitador might not be obvious to users who set multiple limit definitions and will require additional information reported back to the user about the actual status of the limit definitions stated in a RLP. On the other hand, it allows enables use cases of different limit definitions that vary on the counter qualifiers, additional "soft" conditions, or actual rate limit values to be triggered by a same HTTPRouteRule.
-when
conditions based on attributes of the HTTP request¶As a first step, users will not be able to write "soft" when
conditions to selective apply rate limit definitions based on attributes of the HTTP request that otherwise could be specified using the routeSelectors
field of the RLP instead.
On one hand, using when
conditions for route filtering would make it easy to define limits when the HTTPRoute cannot be modified to include the special rule. On the other hand, users would miss information in the status. An HTTPRouteRule for GET|POST /toys*
, for example, that is targeted with an additional "soft" when
condition that specifies that the method must be equal to GET
and the path exactly equal to /toys/special
(see Example 3) would be reported as rate limited with extra details that this is in fact only for GET /toys/special
. For small deployments, this might be considered acceptable; however it would easily explode to unmanageable number of cases for deployments with only a few limit definitions and HTTPRouteRules.
Moreover, by not specifying a more strict HTTPRouteRule for GET /toys/special
, the RLP controller would bind the limit definition to other rules that would cause the rate limit filter to invoke the rate limit service (Limitador) for cases other than strictly GET /toys/special
. Even if the rate limits would still be ensured to apply in Limitador only for GET /toys/special
(due to the presence of a hypothetical "soft" when
condition), an extra no-op hop to the rate limit service would happen. This is avoided with the current imposed limitation.
Example of "soft" when
conditions for rate limit based on attributes of the HTTP request (NOT SUPPORTED):
apiVersion: kuadrant.io/v2beta1
-kind: RateLimitPolicy
-metadata:
- name: toystore-special-toys
- namespace: toystore
-spec:
- targetRef:
- group: gateway.networking.k8s.io
- kind: HTTPRoute
- name: toystore
- limits:
- specialToys:
- rates:
-
- - limit: 150
- unit: second
- routeSelectors:
- - matches: # matches the original HTTPRouteRule GET|POST /toys*
- - path:
- type: PathPrefix
- value: "/toys"
- method: GET
- when:
- - selector: context.request.http.method # cannot omit this selector or POST /toys/special would also be rate limited
- operator: eq
- value: GET
- - selector: context.request.http.path
- operator: eq
- value: /toys/special
-
gateway_actions:
-
-- rules:
- - paths: ["/toys*"]
- methods: ["GET"]
- hosts: ["*.toystore.acme.com"]
- - paths: ["/toys*"]
- methods: ["POST"]
- hosts: ["*.toystore.acme.com"]
- configurations:
- - generic_key:
- descriptor_key: "toystore/toystore-special-toys/specialToys"
- descriptor_value: "1"
- - request_headers:
- descriptor_key: "context.request.http.method"
- header_name: ":method"
- - request_headers:
- descriptor_key: "context.request.http.path"
- header_name: ":path"
-
The main drivers behind the proposed design for the selectors (conditions and counter qualifiers), based on (i) structured condition expressions composed of fields selector
, operator
, and value
, and (ii) when
conditions and counters
separated in two distinct fields (variation "C" below), are:
AuthConfig
API, which also specifies when
conditions expressed in selector
, operator
, and value
fields;Nonetheless here are a few alternative variations to consider:
-- | Structured condition expressions | -Parsed condition expressions | -
---|---|---|
Single field | -
- A
- -selectors: - - - selector: context.request.http.method - operator: eq - value: GET - - selector: auth.identity.username- |
-
- B
- -selectors: - - context.request.http.method == "GET" - - auth.identity.username- |
-
Distinct fields | -
- C ⭐️
- -when: - - selector: context.request.http.method - operator: eq - value: GET -counters: - - auth.identity.username- |
-
- D
- -when: - - context.request.http.method == "GET" -counters: - - auth.identity.username- |
-
⭐️ Variation adopted for the examples and (so far) final design proposal.
-Most implementations currently orbiting around Gateway API (e.g. Istio, Envoy Gateway, etc) for added RL functionality seem to have been leaning more to the direct route extension pattern instead of Policy Attachment. That might be an option particularly suitable for gateway implementations (gateway providers) and for those aiming to avoid dealing with defaults and overrides.
-in
operator?operator
s do we need to support (e.g. eq
, neq
, exists
, nexists
, matches
)?Limitador
, Kuadrant
CRDs, MCTC)?routeSelectors
and the semantics around it to the AuthPolicy
API (aka "KAP v2").well-known-attributes
Define a well-known structure for users to declare request data selectors in their RateLimitPolicies and AuthPolicies. This structure is referred to as the Kuadrant Well-known Attributes.
-The well-known attributes let users write policy rules – conditions and, in general, dynamic values that refer to attributes in the data plane - in a concise and seamless way.
-Decoupled from the policy CRDs, the well-known attributes:
-One who writes a Kuadrant policy and wants to build policy constructs such as conditions, qualifiers, variables, etc, based on dynamic values of the data plane, must refer the attributes that carry those values, using the declarative language of Kuadrant's Well-known Attributes.
-A dynamic data plane value is typically a value of an attribute of the request or an Envoy Dynamic Metadata entry. It can be a value of the outer request being handled by the API gateway or proxy that is managed by Kuadrant ("context request") or an attribute of the direct request to the Kuadrant component that delivers the functionality in the data plane (rate-limiting or external auth).
-A Well-known Selector is a construct of a policy API whose value contains a direct reference to a well-known attribute. The language of the well-known attributes and therefore what one would declare within a well-known selector resembles a JSON path for navigating a possibly complex JSON object.
-Example 1. Well-known selector used in a condition
-apiGroup: examples.kuadrant.io
-kind: PaintPolicy
-spec:
- rules:
-
- - when:
- - selector: auth.identity.group
- operator: eq
- value: admin
- color: red
-
In the example, auth.identity.group
is a well-known selector of an attribute group
, known to be injected by the external authorization service (auth
) to describe the group the user (identity
) belongs to. In the data plane, whenever this value is equal to admin
, the abstract PaintPolicy
policy states that the traffic must be painted red
.
Example 2. Well-known selector used in a variable
-apiGroup: examples.kuadrant.io
-kind: PaintPolicy
-spec:
- rules:
-
- - color: red
- alpha:
- dynamic: request.headers.x-color-alpha
-
In the example, request.headers.x-color-alpha
is a selector of a well-known attribute request.headers
that gives access to the headers of the context HTTP request. The selector retrieves the value of the x-color-alpha
request header to dynamically fill the alpha
property of the abstract PaintPolicy
policy at each request.
The Well-known Attributes are a compilation inspired by some of the Envoy attributes and Authorino's Authorization JSON and its related JSON paths.
-From the Envoy attributes, only attributes that are available before establishing connection with the upstream server qualify as a Kuadrant well-known attribute. This excludes attributes such as the response attributes and the upstream attributes.
-As for the attributes inherited from Authorino, these are either based on Envoy's AttributeContext
type of the external auth request API or from internal types defined by Authorino to fulfill the Auth Pipeline.
These two subsets of attributes are unified into a single set of well-known attributes. For each attribute that exists in both subsets, the name of the attribute as specified in the Envoy attributes subset prevails. Example of such is request.id
(to refer to the ID of the request) superseding context.request.http.id
(as the same attribute is referred in an Authorino AuthConfig
).
The next sections specify the well-known attributes organized in the following groups:
-The following attributes are related to the context HTTP request that is handled by the API gateway or proxy managed by Kuadrant.
-Attribute |
- Type |
- Description |
- Auth |
- RL |
-
---|---|---|---|---|
request.id |
- String |
- Request ID corresponding to |
- ✓ |
- ✓ |
-
request.time |
- Timestamp |
- Time of the first byte received |
- ✓ |
- ✓ |
-
request.protocol |
- String |
- Request protocol (“HTTP/1.0”, “HTTP/1.1”, “HTTP/2”, or “HTTP/3”) |
- ✓ |
- ✓ |
-
request.scheme |
- String |
- The scheme portion of the URL e.g. “http” |
- ✓ |
- ✓ |
-
request.host |
- String |
- The host portion of the URL |
- ✓ |
- ✓ |
-
request.method |
- String |
- Request method e.g. “GET” |
- ✓ |
- ✓ |
-
request.path |
- String |
- The path portion of the URL |
- ✓ |
- ✓ |
-
request.url_path |
- String |
- The path portion of the URL without the query string |
- - | ✓ |
-
request.query |
- String |
- The query portion of the URL in the format of “name1=value1&name2=value2” |
- ✓ |
- ✓ |
-
request.headers |
- Map<String, String> |
- All request headers indexed by the lower-cased header name |
- ✓ |
- ✓ |
-
request.referer |
- String |
- Referer request header |
- - | ✓ |
-
request.useragent |
- String |
- User agent request header |
- - | ✓ |
-
request.size |
- Number |
- The HTTP request size in bytes. If unknown, it must be -1 |
- ✓ |
- - |
request.body |
- String |
- The HTTP request body. (Disabled by default. Requires additional proxy configuration to enabled it.) |
- ✓ |
- - |
request.raw_body |
- Array<Number> |
- The HTTP request body in bytes. This is sometimes used instead of |
- ✓ |
- - |
request.context_extensions |
- Map<String, String> |
- This is analogous to |
- ✓ |
- - |
The following attributes are available once the downstream connection with the API gateway or proxy managed by Kuadrant is established. They apply to HTTP requests (L7) as well, but also to proxied connections limited at L3/L4.
-Attribute |
- Type |
- Description |
- Auth |
- RL |
-
---|---|---|---|---|
source.address |
- String |
- Downstream connection remote address |
- ✓ |
- ✓ |
-
source.port |
- Number |
- Downstream connection remote port |
- ✓ |
- ✓ |
-
source.service |
- String |
- The canonical service name of the peer |
- ✓ |
- - |
source.labels |
- Map<String, String> |
- The labels associated with the peer. These could be pod labels for Kubernetes or tags for VMs. The source of the labels could be an X.509 certificate or other configuration. |
- ✓ |
- - |
source.principal |
- String |
- The authenticated identity of this peer. If an X.509 certificate is used to assert the identity in the proxy, this field is sourced from “URI Subject Alternative Names“, “DNS Subject Alternate Names“ or “Subject“ in that order. The format is issuer specific – e.g. SPIFFE format is |
- ✓ |
- - |
source.certificate |
- String |
- The X.509 certificate used to authenticate the identify of this peer. When present, the certificate contents are encoded in URL and PEM format. |
- ✓ |
- - |
destination.address |
- String |
- Downstream connection local address |
- ✓ |
- ✓ |
-
destination.port |
- Number |
- Downstream connection local port |
- ✓ |
- ✓ |
-
destination.service |
- String |
- The canonical service name of the peer |
- ✓ |
- - |
destination.labels |
- Map<String, String> |
- The labels associated with the peer. These could be pod labels for Kubernetes or tags for VMs. The source of the labels could be an X.509 certificate or other configuration. |
- ✓ |
- - |
destination.principal |
- String |
- The authenticated identity of this peer. If an X.509 certificate is used to assert the identity in the proxy, this field is sourced from “URI Subject Alternative Names“, “DNS Subject Alternate Names“ or “Subject“ in that order. The format is issuer specific – e.g. SPIFFE format is |
- ✓ |
- - |
destination.certificate |
- String |
- The X.509 certificate used to authenticate the identify of this peer. When present, the certificate contents are encoded in URL and PEM format. |
- ✓ |
- - |
connection.id |
- Number |
- Downstream connection ID |
- - | ✓ |
-
connection.mtls |
- Boolean |
- Indicates whether TLS is applied to the downstream connection and the peer ceritificate is presented |
- - | ✓ |
-
connection.requested_server_name |
- String |
- Requested server name in the downstream TLS connection |
- - | ✓ |
-
connection.tls_session.sni |
- String |
- SNI used for TLS session |
- ✓ |
- - |
connection.tls_version |
- String |
- TLS version of the downstream TLS connection |
- - | ✓ |
-
connection.subject_local_certificate |
- String |
- The subject field of the local certificate in the downstream TLS connection |
- - | ✓ |
-
connection.subject_peer_certificate |
- String |
- The subject field of the peer certificate in the downstream TLS connection |
- - | ✓ |
-
connection.dns_san_local_certificate |
- String |
- The first DNS entry in the SAN field of the local certificate in the downstream TLS connection |
- - | ✓ |
-
connection.dns_san_peer_certificate |
- String |
- The first DNS entry in the SAN field of the peer certificate in the downstream TLS connection |
- - | ✓ |
-
connection.uri_san_local_certificate |
- String |
- The first URI entry in the SAN field of the local certificate in the downstream TLS connection |
- - | ✓ |
-
connection.uri_san_peer_certificate |
- String |
- The first URI entry in the SAN field of the peer certificate in the downstream TLS connection |
- - | ✓ |
-
connection.sha256_peer_certificate_digest |
- String | -SHA256 digest of the peer certificate in the downstream TLS connection if present |
- - | ✓ |
-
The following attributes are related to the Envoy proxy filter chain. They include metadata exported by the proxy throughout the filters and information about the states of the filters themselves.
-Attribute |
- Type |
- Description |
- Auth |
- RL |
-
---|---|---|---|---|
metadata |
- - | Dynamic request metadata |
- ✓ |
- ✓ |
-
filter_state |
- Map<String, String> |
- Mapping from a filter state name to its serialized string value |
- - | ✓ |
-
The following attributes are exclusive of the external auth service (Authorino).
-Attribute |
- Type |
- Description |
- Auth |
- RL |
-
---|---|---|---|---|
auth.identity |
- Any |
- Single resolved identity object, post-identity verification |
- ✓ |
- - |
auth.metadata |
- Map<String, Any> |
- External metadata fetched |
- ✓ |
- - |
auth.authorization |
- Map<String, Any> |
- Authorization results resolved by each authorization rule, access granted only |
- ✓ |
- - |
auth.response |
- Map<String, Any> |
- Response objects exported by the auth service post-access granted |
- ✓ |
- - |
auth.callbacks |
- Map<String, Any> |
- Response objects returned by the callback requests issued by the auth service |
- ✓ |
- - |
The auth service also supports modifying selected values by chaining modifiers in the path.
-The following attributes are exclusive of the rate-limiting service (Limitador).
-Attribute |
- Type |
- Description |
- Auth |
- RL |
-
---|---|---|---|---|
ratelimit.domain |
- String |
- The rate limit domain. This enables the configuration to be namespaced per application (multi-tenancy). |
- - | ✓ |
-
ratelimit.hits_addend |
- Number |
- Specifies the number of hits a request adds to the matched limit. Fixed value: `1`. Reserved for future usage. |
- - | ✓ |
-
The decoupling of the well-known attributes and the language of well-known attributes and selectors from the individual policy CRDs is what makes it somewhat flexible and common across the components (rate-limiting and auth). However, it's less structured and it introduces another syntax for users to get familiar with.
-This additional language competes with the language of the route selectors (RFC 0001), based on Gateway API's HTTPRouteMatch
type.
Being "soft-coded" in the policy specs (as opposed to a hard-coded sub-structure inside of each policy type) does not mean it's completely decoupled from implementation in the control plane and/or intermediary data plane components. Although many attributes can be supported almost as a pass-through, from being used in a selector in a policy, to a corresponding value requested by the wasm-shim to its host, that is not always the case. Some translation may be required for components not integrated via wasm-shim (e.g. Authorino), as well as for components integrated via wasm-shim (e.g. Limitador) in special cases of composite or abstraction well-known attributes (i.e. attributes not available as-is via ABI, e.g. auth.identity
in a RLP). Either way, some validation of the values introduced by users in the selectors may be needed at some point in the control plane, thus requiring arguably a level of awaresness and coupling between the well-known selectors specification and the control plane (policy controllers) or intermediary data plane (wasm-shim) components.
As an alternative to JSON path-like selectors based on a well-known structure that induces the proposed language of well-known attributes, these same attributes could be defined as sub-types of each policy CRD. The Golang packages defining the common attributes across CRDs could be shared by the policy type definitions to reduce repetition. However, that approach would possibly involve a staggering number of new type definitions to cover all the cases for all the groups of attributes to be supported. These are constructs that not only need to be understood by the policy controllers, but also known by the user who writes a policy.
-Additionally, all attributes, including new attributes occasionally introduced by Envoy and made available to the wasm-shim via ABI, would always require translation from the user-level abstraction how it's represented in a policy, to the actual form how it's used in the wasm-shim configuration and Authorino AuthConfigs.
-Not implementing this proposal and keeping the current state of things mean little consistency between these common constructs for rules and conditions on how they are represented in each type of policy. This lack of consistency has a direct impact on the overhead faced by users to learn how to interact with Kuadrant and write different kinds of policies, as well as for the maintainers on tasks of coding for policy validation and reconciliation of data plane configurations.
-Authorino's dynamic JSON paths, related to Authorino's Authorization JSON and used in when
conditions and inside of multiple other constructs of the AuthConfig, are an example of feature of very similar approach to the one proposed here.
Arguably, Authorino's perceived flexibility would not have been possible with the Authorization JSON selectors. Users can write quite sophisticated policy rules (conditions, variable references, etc) by leveraging the those dynamic selectors. Because they are backed by JSON-based machinery in the code, Authorino's selectors have very little to, in some cases, none at all variation compared Open Policy Agent's Rego policy language, which is often used side by side in the same AuthConfigs.
-Authorino's Authorization JSON selectors are, in one hand, more restrict to the structure of the CheckRequest
payload (context.*
attributes). At the same time, they are very open in the part associated with the internal attributes built along the Auth Pipeline (i.e. auth.*
attributes). That makes Authorino's Authorization JSON selectors more limited, compared to the Envoy attributes made available to the wasm-shim via ABI, but also harder to validate. In some cases, such as of deep references to inside objects fetched from external sources of metadata, resolved OPA objects, JWT claims, etc, it is impossible to validate for correct references.
Another experience learned from Authorino's Authorization JSON selectors is that they depend substantially on the so-called "modifiers". Many use cases involving parsing and breaking down attributes that are originally available in a more complex form would not be possible without the modifiers. Examples of such cases are: extracting portions of the path and/or query string parameters (e.g. collection and resource identifiers), applying translations on HTTP verbs into corresponding operations, base64-decoding values from the context HTTP request, amongst several others.
-How to deal with the differences regarding the availability and data types of the attributes across clients/hosts?
-Can we make more attributes that are currently available to only one of the components common to both?
-Will we need some kind of global support for modifiers (functions) in the well-known selectors or those can continue to be an Authorino-only feature?
-Does Authorino, which is more strict regarding the data structure that induces the selectors, need to implement this specification or could/should it keep its current selectors and a translation be performed by the AuthPolicy controller?
-auth.*
attributes supported in the rate limit servicerequest.authenticated
request.operation.(read|write)
request.param.my-param
connection.secure
Other Envoy attributes
-Attribute |
- Type |
- Description |
- Auth |
- RL |
-
---|---|---|---|---|
wasm.plugin_name |
- String |
- Plugin name |
- - | ✓ |
-
wasm.plugin_root_id |
- String |
- Plugin root ID |
- - | ✓ |
-
wasm.plugin_vm_id |
- String |
- Plugin VM ID |
- - | ✓ |
-
wasm.node |
- - | Local node description |
- - | ✓ |
-
wasm.cluster_name |
- String |
- Upstream cluster name |
- - | ✓ |
-
wasm.cluster_metadata |
- - | Upstream cluster metadata |
- - | ✓ |
-
wasm.listener_direction |
- Number |
- Enumeration value of the listener traffic direction |
- - | ✓ |
-
wasm.listener_metadata |
- - | Listener metadata |
- - | ✓ |
-
wasm.route_name |
- String |
- Route name |
- - | ✓ |
-
wasm.route_metadata |
- - | Route metadata |
- - | ✓ |
-
wasm.upstream_host_metadata |
- - | Upstream host metadata |
- - | ✓ |
-
Attribute |
- Type |
- Description |
- Auth |
- RL |
-
---|---|---|---|---|
xds.cluster_name |
- String |
- Upstream cluster name |
- - | ✓ |
-
xds.cluster_metadata |
- - | Upstream cluster metadata |
- - | ✓ |
-
xds.route_name |
- String |
- Route name |
- - | ✓ |
-
xds.route_metadata |
- - | Route metadata |
- - | ✓ |
-
xds.upstream_host_metadata |
- - | Upstream host metadata |
- - | ✓ |
-
xds.filter_chain_name |
- String |
- Listener filter chain name |
- - | ✓ |
-
DNSRecord is our API for expressing DNS endpoints via a kube CRD based API. It is managed by the multi-cluster gateway controller based on the desired state expressed in higher level APIs such as the Gateway or a DNSPolicy. In order to provide our feature set, we need to carefully consider how we structure our records and the types of records we need. This document proposes a particular structure based on the requirements and feature set we have.
-We want to be able to support Gateway definitions that use the following listener definitions:
-*.example.com
and fully qualified listener host www.example.com
definitions with the notable exception of fully wildcarded ie *
as we cannot provide any DNS or TLS for something with no defined hostname.https://lucid.app/lucidchart/2f95c9c9-8ddf-4609-af37-48145c02ef7f/edit?viewport_loc=-188%2C-61%2C2400%2C1183%2C0_0&invitationId=inv_d5f35eb7-16a9-40ec-b568-38556de9b568
-For each listener defined in a gateway, we will create a set of records with the following rules.
-none apex domain:
-We will have a generated lb (load balancer) dns name that we will use as a CNAME for the listener hostname. This DNS name is not intended for use within a HTTPRoute but is instead just a DNS construct. This will allow us to set up additional CNAME records for that DNS name in the future that are returned based a GEO location. These DNS records will also be CNAMES pointing to specific gateway dns names, this will allow us to setup a weighted response. So the first layer CNAME handles balancing based on geo, the second layer handles balancing based on weighting.
- shop.example.com
- | |
- (IE) (AUS)
- CNAME lb.shop.. lb.shop..
- | | | |
- (w 100) (w 200) (w 100) (w100)
- CNAME g1.lb.. g2.lb.. g3.lb.. g4.lb..
- A 192.. A 81.. CNAME aws.lb A 82..
-
When there is no geo strategy defined within the DNSPolicy, we will put everything into a default geo (IE a catch-all record) default.lb-{guid}.{listenerHost}
but set the routing policy to GEO allowing us to add more geo based records in the future if the gateway admin decides to move to a geo strategy as their needs grow.
To ensure this lb dns name is unique and does not clash we will use a short guid as part of the subdomain so lb-{guid}.{listenerHost}.
this guid will be based on the gateway name and gateway namespace in the control plane.
For a geo strategy we will add a geo record with a prefix to the lb subdomain based on the geo code. When there is no geo we will use default
as the prefix. {geo-code}.lb-{guid}.{listenerHost}
.
-Finally, for each gateway instance on a target cluster we will add a {spokeClusterName}.lb-{guid}.{listenerHost}
To allow for a mix of hostname and IP address types, we will always use a CNAME . So we will create a dns name for IPAddress with the following structure: {guid}.lb-{guid}.{listenerHost}
where the first guid will be based on the cluster name where the gateway is placed.
An apex domain is the domain at the apex or root of a zone. These are handled differently by DNS as they often have NS and SOA records. Generally it is not possible to set up a CNAME for apex domain (although some providers allow it).
-If a listener is added to a gateway that is an apex domain, we can only add A records for that domain to keep ourselves compliant with as many providers as possible. -If a listener is the apex domain, we will setup A records for that domain (favouring gateways with an IP address or resolving the IP behind a host) but there will be no special balancing/weighting done. Instead, we will expect that the owner of that will setup a HTTPRoute with a 301 permanent redirect sending users from the apex domain e.g. example.com to something like: www.example.com where the www subdomain based listener would use the rules of the none apex domains and be where advanced geo and weighted strategies are applied.
-This is the type of DNS Record structure that would back our default DNSPolicy.
-gateway listener host name : www.example.com
-DNSRecords:
-So in the above example working up from the bottom, we have a mix of hostname and IP based addresses for the gateway instance. We have 2 evenly weighted records that balance between the two available gateways, then next we have the geo based record that is set to a default catch all as no geo has been specified then finally we have the actual listener hostname that points at our DNS based load balancer name.
-DNSRecord Yaml
-apiVersion: kuadrant.io/v1alpha1
-kind: DNSRecord
-metadata:
- name: {gateway-name}-{listenerName}
- namespace: multi-cluster-gateways
-spec:
- dnsName: www.example.com
- managedZone:
- name: mgc-dev-mz
- endpoints:
-
- - dnsName: www.example.com
- recordTTL: 300
- recordType: CNAME
- targets:
- - lb-1ab1.www.example.com
- - dnsName: lb-1ab1.www.example.com
- recordTTL: 300
- recordType: CNAME
- setIdentifier: mygateway-multicluster-gateways
- providerSpecific:
- - name: "geolocation-country-code"
- value: "*"
- targets:
- - default.lb-1ab1.www.example.com
- - dnsName: default.lb-1ab1.www.example.com
- recordTTL: 300
- recordType: CNAME
- setIdentifier: cluster1
- providerSpecific:
- - name: "weight"
- value: "100"
- targets:
- - 1bc1.lb-1ab1.www.example.com
- - dnsName: default.lb-a1b2.shop.example.com
- recordTTL: 300
- recordType: CNAME
- setIdentifier: cluster2
- providerSpecific:
- - name: "weight"
- value: "100"
- targets:
- - aws.lb.com
- - dnsName: 1bc1.lb-1ab1.www.example.com
- recordTTL: 60
- recordType: A
- targets:
- - 192.22.2.1
-
Once the end user selects to use a geo strategy via the DNSPolicy, we then need to restructure our DNS to add in our geo specific records. Here the default record
-lb short code is {gw name + gw namespace} -gw short code is {cluster name}
-gateway listener host : shop.example.com
-DNSRecords:
-In the above example we move from a default catch all to geo specific setup. Based on a DNSPolicy that specifies IE as the default geo location. We leave the default
subdomain in place to allow for clients that may still be using that and set up geo specific subdomains that allow us to route traffic based on its origin. In this example we are load balancing across 2 geos and 4 clusters
In the examples we have used fully qualified domain names, however sometimes it may be required to use a wildcard subdomain. example:
-To support these we need to change the name of the DNSRecord away from the name of the listener as the k8s resource does not allow * in the name.
-To do this we will set the dns record resource name to be a combination of {gateway-name}-{listenerName}
to keep a record of the host this is for we will set a top level property named dnsName
. You can see an example in the DNSRecord above.
This setup allows us a powerful set of features and flexibility
-With this CNAME based approach we are increasing the number of DNS lookups required to get to an IP which will increase the cost and add a small amount of latency. To counteract this, we will set a reasonably high TTL (at least 5 mins) for our CNAMES and (2 mins) for A records
- - - - - - - - - - - - - -Provide a policy for configuring how DNS should be handed for a given gateway. Provide a mechanism for enabling DNS based load balancing.
-Gateway admins, need a way to define the DNS policy for a multi-cluster gateway in order to control how much and which traffic reaches these gateways. -Ideally we would allow them to express a strategy that they want to use without needing to get into the details of each provider and needing to create and maintain dns record structure and individual records for all the different gateways that may be within their infrastructure.
-Allow definition of a DNSPolicy that configures load balancing to decide how traffic should be distributed across multiple gateway instances from the central control plane.
-Provide a control plane DNSPolicy API that uses the idea of direct policy attachment from gateway API that allows a load balancing strategy to be applied to the DNS records structure for any managed listeners being served by the data plane instances of this gateway. -The DNSPolicy also covers health checks that inform the DNS response but that is not covered in this document.
-Below is a draft API for what we anticipate the DNSPolicy to look like
-apiVersion: kuadrant.io/v1alpha1
-kind: DNSPolicy
-spec:
- targetRef: # defaults to gateway gvk and current namespace
- name: gateway-name
- health:
- ...
- loadBalancing:
- weighted:
- defaultWeight: 10
- custom: #optional
-
- - value: AWS #optional with both GEO and weighted. With GEO the custom weight is applied to gateways within a Geographic region
- weight: 10
- - value: GCP
- weight: 20
- GEO: #optional
- defaultGeo: IE # required with GEO. Chooses a default DNS response when no particular response is defined for a request from an unknown GEO.
-
GEO and Weighted load balancing are well understood strategies and this API effectively allow a complex requirement to be expressed relatively simply and executed by the gateway controller in the chosen DNS provider. Our default policy will execute a "Round Robin" weighted strategy which reflects the current default behaviour.
-With the above API we can provide weighted and GEO and weighted within a GEO. A weighted strategy with a minimum of a default weight is always required and the simplest type of policy. The multi-cluster gateway controller will set up a default policy when a gateway is discovered (shown below). This policy can be replaced or modified by the user. A weighted strategy can be complimented with a GEO strategy IE they can be used together in order to provide a GEO and weighted (within a GEO) load balancing. By defining a GEO section, you are indicating that you want to use a GEO based strategy (how this works is covered below).
-apiVersion: kuadrant.io/v1alpha1
-kind: DNSPolicy
-name: default-policy
-spec:
- targetRef: # defaults to gateway gvk and current namespace
- name: gateway-name
- loadBalancing:
- weighted: # required
- defaultWeight: 10 #required, all records created get this weight
- health:
- ...
-
In order to provide GEO based DNS and allow customisation of the weighting, we need some additional information to be provided by the gateway / cluster admin about where this gateway has been placed. For example if they want to use GEO based DNS as a strategy, we need to know what GEO identifier(s) to use for each record we create and a default GEO to use as a catch-all. Also, if the desired load balancing approach is to provide custom weighting and no longer simply use Round Robin, we will need a way to identify which records to apply that custom weighting to based on the clusters the gateway is placed on.
-To solve this we will allow two new attributes to be added to the ManagedCluster
resource as labels:
These two labels allow setting values in the DNSPolicy that will be reflected into DNS records for gateways placed on that cluster depending on the strategies used. (see the first DNSPolicy definition above to see how these values are used) or take a look at the examples at the bottom.
-example : -
apiVersion: cluster.open-cluster-management.io/v1
-kind: ManagedCluster
-metadata:
- labels:
- kuadrant.io/lb-attribute-geo-code: "IE"
- kuadrant.io/lb-attribute-custom-weight: "GCP"
-spec:
-
The attributes provide the key and value we need in order to understand how to define records for a given LB address based on the DNSPolicy targeting the gateway.
-The kuadrant.io/lb-attribute-geo-code
attribute value is provider specific, using an invalid code will result in an error status condition in the DNSrecord resource.
This is an advanced topic and so is broken out into its own proposal doc DNS Record Structure
-Custom weighting will use the associated custom-weight
attribute set on the ManagedCluster
to decide which records should get a specific weight. The value of this attribute is up to the end user.
example:
-apiVersion: cluster.open-cluster-management.io/v1
-kind: ManagedCluster
-metadata:
- labels:
- kuadrant.io/lb-attribute-custom-weight: "GCP"
-
The above is then used in the DNSPolicy to set custom weights for the records associated with the target gateway.
- -So any gateway targeted by a DNSPolicy with the above definition that is placed on a ManagedCluster
with the kuadrant.io/lb-attribute-custom-weight
set with a value of GCP will get an A record with a weight of 20
DNSPolicy should have a ready condition that reflect that the DNSRecords have been created and configured as expected. In the case that there is an invalid policy, the status message should reflect this and indicate to the user that the old DNS has been preserved.
-We will also want to add a status condition to the gateway status indicating it is effected by this policy. Gateway API recommends the following status condition
-- type: gateway.networking.k8s.io/PolicyAffected
- status: True
- message: "DNSPolicy has been applied"
- reason: PolicyApplied
- ...
-
https://github.com/kubernetes-sigs/gateway-api/pull/2128/files#diff-afe84021d0647e83f420f99f5d18b392abe5ec82d68f03156c7534de9f19a30aR888
-apiVersion: kuadrant.io/v1alpha1
-kind: DNSPolicy
-name: RoundRobinPolicy
-spec:
- targetRef: # defaults to gateway gvk and current namespace
- name: gateway-name
- loadBalancing:
- weighted:
- defaultWeight: 10
-
apiVersion: kuadrant.io/v1alpha1
-kind: DNSPolicy
-name: GEODNS
-spec:
- targetRef: # defaults to gateway gvk and current namespace
- name: gateway-name
- loadBalancing:
- weighted:
- defaultWeight: 10
- GEO:
- defaultGeo: IE
-
apiVersion: kuadrant.io/v1alpha1
-kind: DNSPolicy
-name: SendMoreToAzure
-spec:
- targetRef: # defaults to gateway gvk and current namespace
- name: gateway-name
- loadBalancing:
- weighted:
- defaultWeight: 10
- custom:
-
- - attribute: cloud
- value: Azure #any record associated with a gateway on a cluster without this value gets the default
- weight: 30
-
apiVersion: kuadrant.io/v1alpha1
-kind: DNSPolicy
-name: GEODNSAndSendMoreToAzure
-spec:
- targetRef: # defaults to gateway gvk and current namespace
- name: gateway-name
- loadBalancing:
- weighted:
- defaultWeight: 10
- custom:
-
- - attribute: cloud
- value: Azure
- weight: 30
- GEO:
- defaultGeo: IE
-
You cannot have a different load balancing strategy for each listener within a gateway. So in the following gateway definition
-spec:
- gatewayClassName: kuadrant-multi-cluster-gateway-instance-per-cluster
- listeners:
-
- - allowedRoutes:
- namespaces:
- from: All
- hostname: myapp.hcpapps.net
- name: api
- port: 443
- protocol: HTTPS
- - allowedRoutes:
- namespaces:
- from: All
- hostname: other.hcpapps.net
- name: api
- port: 443
- protocol: HTTPS
-
The DNS policy targeting this gateway will apply to both myapp.hcpapps.net and other.hcpapps.net
-However, there is still significant value even with this limitation. This limitation is something we will likely revisit in the future
-An alternative is to configure all of this yourself manually in a dns provider. This can be a highly complex dns configuration that it would be easy to get wrong.
- - - - - - - - - - - - - -policy_status_states
This RFC proposes a new design for any Kuadrant Policy (RateLimitPolicy
, AuthPolicy
, etc..) status definition and transitions.
At the time being, the RateLimitPolicy
and AuthPolicy
status doesn't clearly and truthfully communicate the actual state of
-reconciliation and healthiness with its operator managed services, i.e., the Rate Limit service ("Limitador") and
-the Auth service ("Authorino"), referred to as "Kuadrant services".
As a consequence, misleading information is shared causing unexpected errors and flawed assumptions.
-The following are some issues reported in relation to the aforementioned problems:
-This design for setting the status of the Kuadrant policy CRs is divided in 2 stages, where each stage could be -applied/developed in order and would reflect valuable and accurate information with different degrees of acuity.
-The Policy CRD Status in the following diagrams are simplified as states, which in the -Reference-level explanation will be translated to the actual Status Conditions.
-State of the policy CR defined by: application, validation, and reconciliation of it
-The main signalization at Stage 1 is about whether a policy CR has been Accepted
or not.
States rationale:
-Accepted
: This state is reached after the Validation
and Reconciliation
event has being successfully passed.Invalid
: When the Validation
process encounters an error, this state will be set.TargetNotFound
: This state will be set when the Reconciliation
process encounters an error.Conflicted
: This state will be set when the Reconciliation
process encounters an error.Notes:
-Final state of the policy CR defined by: health check with the Kuadrant services (post-reconciliation)
-The Enforced
type is introduced to capture the difference between a policy been reconciled and it's been enforced
-at the service.
States rationale:
-Enforced
: After a successful response of the Service Probe
, this states communicates the policy is finally enforced.PartiallyEnforced
: This state will be set when the Reconciliation
event encounters an overlap with other policies.Overridden
: This state will be set when the Reconciliation
event invalidates the policy because another one takes precedence.In general, the new states and conditions align with GEP-713.
-Besides the proposed Accepted
PolicyType, the Enforced
PolicyType would be added to reflect the final state of
-the policy, which means that the policy is showing the synced actual state of the Kuadrant services.
The missing Failed
PolicyType would be implicitly represented by the TargetNotFound
and Invalid
PolicyTypeReason.
All conditions are top-level.
-Type | -Status | -Reason | -Message | -
---|---|---|---|
Accepted | -True | -"Accepted" | -"KuadrantPolicy has been accepted" | -
- | False | -"Conflicted" | -"KuadrantPolicy is conflicted by [policy-ns/policy-name], ..." | -
- | False | -"Invalid" | -"KuadrantPolicy is invalid" | -
- | False | -"TargetNotFound" | -"KuadrantPolicy target [resource-name] was not found" | -
Enforced | -True | -"Enforced" | -"KuadrantPolicy has been successfully enforced" | -
- | False | -"Unknown" | -"KuadrantPolicy has encountered some issues" | -
- | False | -"Overridden" | -"KuadrantPolicy is overridden by [policy-ns/policy-name], ..." | -
Messages corresponding falsey statuses are required and should reflect the error that encountered.
-It's possible to have the Failed state as a top level condition too. In this case, it might be useful to consider a -third "Unknown" status.
-The Status stanza of the policy CRs must implement Gateway API's PolicyAncestorStatus -struct. This will provide broader consistency and improved discoverability of effective policies.
-Full implementation of Stage 2 states assumes reporting mechanisms in place, provided by the Kuadrant services, that
-allow tracing the state of the configurations applied on the services, back to the original policies, to infer the
-final state of the policy CRs (i.e. whether truly Enforced
or not.)
Without such, Stage 2 could be only partially achieved, by relying only on Reconciliation events.
-Another option was considered (previously referred to as "Option 1"). While valid, this alternative would not align -with GEP-713, neither it would be as flexible as -the final design proposed.
-The implementation of this proposal could be part of kuadrant/gateway-api-machinery.
- - - - - - - - - - - - - -single-cluster-dnspolicy
Proposal for changes to the DNSPolicy
API to allow it to provide a simple routing strategy as an option in a single cluster context. This will remove, but not negate, the complex DNS structure we use in a multi-cluster environment and in doing so allow use of popular dns integrators such as external-dns .
The DNSPolicy
API (v1alpha1), was implemented as part of our multi cluster gateway offering using OCM and as such the design and implementation were influenced heavily by how we want multi cluster dns to work.
The DNSPolicy can be used to target a Gateway in a single cluster context and will create dns records for each listener host in an appropriately configured external dns provider.
-In this context the advanced loadbalancing
configuration is unnecessary, and the resulting DNSRecord can be created mapping individual listener hosts to a single DNS A or CNAME record by using the simple
routing strategy in the DNSPolicy.
Example 1. DNSPolicy using simple
routing strategy
apiVersion: kuadrant.io/v1alpha2
-kind: DNSPolicy
-metadata:
- name: prod-web
- namespace: my-gateways
-spec:
- providerRef:
- name: my-route53-credentials
- targetRef:
- name: prod-web
- group: gateway.networking.k8s.io
- kind: Gateway
- routingStrategy: simple
-
apiVersion: gateway.networking.k8s.io/v1beta1
-kind: Gateway
-metadata:
- name: prod-web
- namespace: my-gateways
-spec:
- gatewayClassName: istio
- listeners:
-
- - allowedRoutes:
- namespaces:
- from: All
- name: api
- hostname: "myapp.mn.hcpapps.net"
- port: 80
- protocol: HTTP
-status:
- addresses:
- - type: IPAddress
- value: 172.31.200.0
-
In the example the api
listener has a hostname myapp.mn.hcpapps.net
that matches a hosted zone being managed by the provider referenced my-route53-credentials
in the DNSPolicy.
-As the simple
routing strategy is set in the DNSPolicy a DNSRecord resource with the following contents will be created:
apiVersion: kuadrant.io/v1alpha2
-kind: DNSRecord
-metadata:
- name: prod-web-api
- namespace: my-gateways
-spec:
- providerRef:
- name: my-route53-credentials
- endpoints:
-
- - dnsName: myapp.mn.hcpapps.net
- recordTTL: 60
- recordType: A
- targets:
- - 172.31.200.0
-
The providerRef
is included in the DNSRecord to allow the dns record controller to load the appropriate provider configuration during reconciliation and create the DNS records in the dns provider service e.g. route 53.
Example 2. DNSPolicy using simple
routing strategy on multi cluster gateway
apiVersion: kuadrant.io/v1alpha2
-kind: DNSPolicy
-metadata:
- name: prod-web
- namespace: my-gateways
-spec:
- providerRef:
- name: my-route53-credentials
- targetRef:
- name: prod-web
- group: gateway.networking.k8s.io
- kind: Gateway
- routingStrategy: simple
-
apiVersion: gateway.networking.k8s.io/v1beta1
-kind: Gateway
-metadata:
- name: prod-web
- namespace: my-gateways
-spec:
- gatewayClassName: kuadrant-multi-cluster-gateway-instance-per-cluster
- listeners:
-
- - allowedRoutes:
- namespaces:
- from: All
- name: api
- hostname: "myapp.mn.hcpapps.net"
- port: 80
- protocol: HTTP
-status:
- addresses:
- - type: kuadrant.io/MultiClusterIPAddress
- value: 172.31.200.0
- - type: kuadrant.io/MultiClusterIPAddress
- value: 172.31.201.0
-
Similar to example 1, except here the Gateway is a multi cluster gateway that has had its status updated by the Gateway
controller to include kuadrant.io/MultiClusterIPAddress
type addresses.
-As the simple
routing strategy is set in the DNSPolicy a DNSRecord resource with the following contents will be created:
apiVersion: kuadrant.io/v1alpha2
-kind: DNSRecord
-metadata:
- name: prod-web-api
- namespace: my-gateways
-spec:
- providerRef:
- name: my-route53-credentials
- endpoints:
-
- - dnsName: myapp.mn.hcpapps.net
- recordTTL: 60
- recordType: A
- targets:
- - 172.31.200.0
- - 172.31.201.0
-
DNSPolicy:
-spec.providerRef
spec.routingStrategy
v1alpha2
DNSRecord:
-spec.managedZone
replaced with spec.providerRef
spec.zoneID
v1alpha2
ManagedZone:
-The providerRef
field is mandatory and contains a reference to a secret containing provider credentials.
- `spec.providerRef.name` - name of the provider resource.
-
A DNSPolicy
referencing a providerRef secret will expect that secret to exist in the same namespace. The expected contents of the secrets data is comparable to the dnsProviderSecretRef
used by ManageZones.
apiVersion: v1
-kind: Secret
-metadata:
- name: aws-credentials
-type: kuadrant.io/aws
-data:
- AWS_ACCESS_KEY_ID: "foo"
- AWS_SECRET_ACCESS_KEY: "bar"
- CONFIG:
- zoneIDFilter:
-
- - Z04114632NOABXYWH93QUl
-
The CONFIG
section of the secrets data will be added to allow provider specific configuration to be stored alongside the providers credentials and can be used during the instantiation of the provider client, and during any provider operations.
-The above for example would use the zoneIDFilter
value to limit what hosted zones this provider is allowed to update.
The routingStrategy
field is mandatory and dictates what kind of dns record structure the policy will create. Two routing strategy options are allowed simple
or weightedGeo
.
A reconciliation of DNSPolicy processes the target gateway and creates a DNSRecord per listener that is supported by the currently configured provider(hostname matches the hosted zones accessible with the credentials and config). The routing strategy used will determine the contents of the DNSRecord resources Endpoints array.
-apiVersion: kuadrant.io/v1alpha2
-kind: DNSRecord
-spec:
- providerRef:
- name: my-route53-credentials
- endpoints:
-
- - dnsName: myapp.mn.hcpapps.net
- recordTTL: 60
- recordType: A
- targets:
- - 172.31.200.0
-
Simple creates a single endpoint for an A record with multiple targets. Although intended for use in a single cluster context a simple routing strategy can still be used in a multi-cluster environment (OCM hub). In this scenario each clusters address will be added to the targets array to create a multi answer section in the dns response.
-apiVersion: kuadrant.io/v1alpha2
-kind: DNSRecord
-spec:
- providerRef:
- name: my-route53-credentials
- endpoints:
-
- - dnsName: myapp.mn.hcpapps.net
- recordTTL: 300
- recordType: CNAME
- targets:
- - lb-4ej5le.myapp.mn.hcpapps.net
- - dnsName: lb-4ej5le.myapp.mn.hcpapps.net
- providerSpecific:
- - name: geo-code
- value: '*'
- recordTTL: 300
- recordType: CNAME
- setIdentifier: default
- targets:
- - default.lb-4ej5le.myapp.mn.hcpapps.net
- - dnsName: default.lb-4ej5le.myapp.mn.hcpapps.net
- providerSpecific:
- - name: weight
- value: "120"
- recordTTL: 60
- recordType: CNAME
- setIdentifier: lrnse3.lb-4ej5le.myapp.mn.hcpapps.net
- targets:
- - lrnse3.lb-4ej5le.myapp.mn.hcpapps.net
- - dnsName: lrnse3.lb-4ej5le.myapp.mn.hcpapps.net
- recordTTL: 60
- recordType: A
- targets:
- - 172.31.200.0
-
WeightedGeo creates a more complex set of endpoints which use a combination of weighted and geo routing strategies. Although intended for use in a multi-cluster environment (OCM hub) it will still be possible to use it in a single cluster context. In this scenario the record structure described above would be created for the single cluster.
-This is the current default for DNSPolicy in a multi-cluster environment (OCM hub) and more details about it can be found in the original DNSPolicy rfc.
-More details of providerRef
found in DNSPolicy.spec.providerRef
The DNSRecord API is updated to remove the managedZone
reference in favour of directly referencing the providerRef
credentials instead. The DNSRecord reconciliation will be unchanged except for loading the provider client from providerRef
credentials.
The DNSPolicy reconciliation will be updated to remove the requirement for a ManagedZone resource to be created before a DNSPolicy can create dns records for it, instead it will be replaced in favour of just listing available zones directly in the currently configured dns provider. If no matching zone is found, no DNSRecord will be created.
-There is a potential for a DNSRecord to be created successfully, but then a provider updated to remove access. In this case it is the responsibility of the DNSPolicy controller to report appropriate status back to the policy and target resource about the failure to process the record. More details on how status will be reported can be found in rfc-0004
-The zoneID
field is mandatory and contains the provider specific id of the hosted zone that this record should be published into.
The DNSRecord reconciliation will use this zone when creating/updating or deleting endpoints for this record set.
-The zoneID
should not change after being selected during initial creation and as such will be marked as immutable.
When a provider is configured using a kind not supported by the DNSPolicy
controller e.g. ExternalDNS
we will be relying on an external controller to correctly update the status of any DNSRecord resources created by our policy. This may have a negative impact on our ability to correctly report status back to the target resource.
When using a weightedGeo routing strategy in a single cluster context it is not expected that this will offer multi cluster capabilities without the use of OCM. Currently, it is expected that if you want to create a recordset that contains the addresses of multiple clusters you must use an OCM hub.
-The ability to support other kubernetes dns controllers such as ExternalDNS would potentially allow us to contribute to some of these projects in the area of polices for dns management of Gateway resources in kubernetes.
- - - - - - - - - - - - - -sub-components-config
Enable configuration of sub components of Kuadrant from a centralized location, namely the Kuadrant CR.
-The initial request comes from MGC to configure Redis for Limitador by the following issue #163. -MGC's current work around is to update the Limitador CR after the deployment with the configuration setting for Redis Instance. -This change would allow for the configuration of sub components before the Kuadrant is deployed.
-This reduces the number of CRs that users of Kuadrant are required to modify to get the installation they require. -The sub components CRs (Authorino, Limitador) never have to be modified by a Kuadrant user (and should never be modified by a Kuadrant User).
-As the Kuadrant operator would be responsible for reconciling these configurations into the requested components, restrictions and limitations can be placed on the components which maybe allowed in a standalone installation. -An example in this space is the disk storage for Limitador which is a new feature and the Kuadrant installation may not want to support it till there is a proven track record for the feature.
-For existing Kuadrant Users this may be a possible breaking changes if those users manually configure the Kuadrant sub components via their CRs. -A guide can be created to help migrate the users configurations to the Kuadrant CR. -This guide can be part of the release notes and/or possibly released before the release of Kuadrant.
-The deployment configuration for each component can be placed in the Kuadrant CR. -These configurations are then reconciled into the CRs for each component. -Only the options below are exposed in the Kuadrant CR. -All fields in the spec are optional.
-apiVersion: kuadrant.io/v1beta1
-kind: Kuadrant
-metadata:
- name: kuadrant-sample
-spec:
- limitador:
- afffinity: ...
- listener: ...
- pdb: ...
- replicas: ...
- resourceRequirements: ...
- storage: ...
- authorino:
- evaluatorCacheSize: ...
- healthz: ...
- listener: ...
- logLevel: ...
- metrics: ...
- oidcServer: ...
- replicas: ...
- tracing: ...
- volumes: ...
-status:
- ...
-
The Kuadrant operator will watch for changes in the Authorino and Limitador CRs, reconciling back any changes that a user may do to these configurations.
-How ever Kuadrant operator will not reconcile fields that are given above.
-An example of this is the image
field on the Authorino CR.
-This field allows a user to set the image that Authorino is deployed with.
-The feature is meant for dev and testing purposes.
-If a user wishes to use a different image, they can.
-Kuadrant assumes they know what they are doing but requires the user to set the change on the component directly.
Only the sub component operator will be responsible for actioning the configurations pasted from the Kuadrant CR to the sub components CR. -This ensure no extra changes will be required in the sub operators to meet the needs of Kuadrant.
-Status errors related to the configuration of the sub components should be reported back to the Kuadrant CR. -The errors messages in Kuadrant state what components are currently having issue and which resource to review for more details.
-All the fields in the Authorino and Limitador CRs that are configurable in the Kuadrant CR are optional and have sound defaults. -Kuadrant needs to remain installable with out having to set any spec in the Kuadrant CR.
-The Kuadrant operator should only reconcile the spec that is given. -This would mean if the user states the number of replicas to be used in one of the components only the replica field for that component should be reconciled. -As the other fields would be blank at this stage, blank fields would not be reconciled to the component CR. -By this behaviour a few things are being achieved. -Component controllers define the defaults to be used in the components. -Optional fields in the component CRs never get set with blank values. -Blank values in the component CR could override the defaults of the components causing unexpected behaviour. -Existing Kuadrant users may already have custom fields set in the component CRs. -By only reconciling the fields set in the kuadrant CR this allows time for a user to migrate their custom configuration from the component CR to the Kuadrant CR.
-Fields being reconcile can be classified into different groups. -These classifications are based around the tasks a user is achieve.
-There are a number of fields in both Authorino and Limitador that are not reconciled. -Reasons for doing this are:
-It is better to start with a sub set of features and expand to include more at a later date. -Removing feature support is far harder than adding it.
-There are four classifications the unreconciled fields fail into.
-As the Kuadrant CR spec will be a sub set of the features that can be configured in the sub components spec, extra maintenances will be required to ensure specs are in sync.
-New features of a component will not be accessible in Kuadrant initially. -This is both a pro and a con.
-Documentation becomes harder, as the sub component should be documenting their own features but in Kuadrant the user does not configure the feature in sub component. -This has the risk of confusing new users.
-One alternative that was being looked at was allowing the user to bring their own Limitador instances by stating which Limitador CR Kuadrant should use. -A major point of issue with this approach was knowing what limits the user had configured and what limits Kuadrant configured. -Sharing global counters is a valid reason to want to share Limitador instances. -How ever it this case Limitador would not be using one replica and therefore would have a back-end storage configured. -It is the back-end storage that needs to be shared across instances. -This can be done with adding the configuration in the Kuadrant CR.
-Discuss prior art, both the good and the bad, in relation to this proposal. -A few examples of what this can include are:
-This section is intended to encourage you as an author to think about the lessons from other tentatives - successful or not, provide readers of your RFC with a fuller picture.
-Note that while precedent set by other projects is some motivation, it does not on its own motivate an RFC.
-Think about what the natural extension and evolution of your proposal would be and how it would affect the platform and project as a whole. Try to use this section as a tool to further consider all possible interactions with the project and its components in your proposal. Also consider how this all fits into the roadmap for the project and of the relevant sub-team.
-This is also a good place to "dump ideas", if they are out of scope for the RFC you are writing but otherwise related.
-Note that having something written down in the future-possibilities section is not a reason to accept the current or a future RFC; such notes should be in the section on motivation or rationale in this or subsequent RFCs. The section merely provides additional information.
-The implementation stated here allows the user to state spec fields in the component CRs or the Kuadrant CR (Kuadrant CR overrides the component CRs). -A future possibility would be to warn the user if they add configuration to the components CRs that would get overridden if the same spec fields are configured in the Kuadrant CR.
- - - - - - - - - - - - - -policy_sync_v1
The ability for the Multicluster Gateway Controller to sync policies defined in -the hub cluster downstream to the spoke clusters, therefore allowing all policies -to be defined in the same place. These policies will be reconciled by the downstream -policy controller(s).
-Policy: When refering to a Policy, this document is refering to a Gateway API - policy as defined in the Policy Attachment Model. The Multicluster Gateway Controller - relies on OCM as a Multicluster solution, which defines its own unrelated - set of Policies and Policy Framework. Unless explicitely mentioned, this document - refers to Policies as Gateway API Policies.
-Policy overriding: The concept of policy overriding is mentioned in this document. It refers to the proposed ability of the downstream Gateway implementation to prioritise downstream Policies against synced Policies in case of conflicts.
-Currently, Kuadrant's support for the Policy Attachment Model can be divided in -two categories:
-In a realistic multicluster scenario where multiple spoke clusters are present, the management of these policies can become tedious and error-prone, as policies have -to be defined in the hub cluster, as well as replicated in the multiple spoke clusters.
-As Kuadrant users:
-The policy sync feature will allow a gateway-admin to configure, via GatewayClass -parameters, a set of Policy GVRs to be synced by the Multicluster Gateway Controller.
-The policiesToSync
field in the parameters defines those GVRs. For example, in
-order to configure the controller to sync AuthPolicies:
"policiesToSync": [
- {
- "group": "kuadrant.io",
- "version": "v1beta1",
- "resource": "authpolicies"
- }
-]
-
The support for resources that the controller can sync is limited by the following:
-.spec.targetRef
fieldWhen a Policy is configured to be synced in a GatewayClass, the Multicluster -Gateway Controller starts watching events on the resources, and propagates changes -by placing the policy in the spoke clusters, with the following mutations:
-TargetRef
of the policy is changed to reference the downstream Gatewaykuadrant.io/policy-synced
annotation is setThe upstream policy is annotated with a reference to the name and namespace -of the downstream policies: -
annotations:
- "kuadrant.io/policies-synced": "[{\"cluster\": \"...\", \"name\": \"...\", \"namespace\": \"...\"}]"
-
The Multicluster Gateway Controller reconciles parameters referenced by the -GatewayClass of a Gateway. A new field is added to the parameters that allows -the configuration of a set of GVRs of Policies to be synced.
-The GatewayClass reconciler validates that:
-Validation failures are reported as part of the status of the GatewayClass
-The Gateway reconciler sets up dynamic watches to react to events on the configured -Policies, calling the PolicySyncer component with the updated Policy as well -as the associated Gateway.
-The PolicySyncer component is in charge of reconciling Policy watch events to -apply the necessary changes and place the Policies in the spoke clusters.
-This component is injected in the event source and called when a change is made -to a hub Policy that has been configured to be synced.
-The PolicySyncer implementation uses OCM ManifestWorks to place the policies in -the spoke clusters. Through the ManifestWorks, OCM allows to:
-In order to avoid conflict with Policies created directly in the spoke clusters, -a hierarchy must be defined to prioritise those Policies.
-The controller will set the kuadrant.io/policy-synced
annotation on the policy
-when placing it in the spoke cluster.
The Kuadrant operator will be aware of the presence of this annotation, and, in case
-of conflicts, override Policies that contain this annotation. When a policy is
-overriden due to conflicts, the Enforced
status will be set to False
, with
-the reason being Overriden
and a human readable message explaining the reason
-why the policy was overriden. See Policy Status RFC
In order for a Policy to be supported for syncing, the MGC must have permissions
-to watch/list/get the resource, and the implementation of the downstream Gateway
-controller must be aware of the policy-synced
annotation.
Different technology stacks are available to sync resources across clusters. However, adoption of these technologies for the purpose of the goal this RFC intends to achieve, implies adding another dependency to the current stack, with the cost of added complexity and maintainance effort.
-The MGC currently uses OCM to place Gateways across clusters. Relying on OCM for the purpose of placing Policies is the most straightforward alternative from a design and implementation point of view.
-Gateway-admins will have no centralized system for handling spoke-level policies targeting a gateway created there from the hub.
-OCMs Policy Framework is a system designed to make assertions about the state of a spoke, and potentially take actions based on that state, as such it is not a suitable replacement for manifestworks in the case of syncing resources to a spoke.
-ManifestWorkPeplicaSets may be a future improvement that the MGC could support -to simplify the placement of related resources, but beyond the scope of this RFC.
-No applicable prior art.
-While the controller can assume common status fields among the Policies that it -syncs, there might be a scenario where certain policies use custom status fields -that are not handled by the controller. In order to support this, two alternatives -are identified:
-Configurable rules.
-An extra field is added in the GatewayClass params that configures the policies -to sync, to specify custom fields that the controller must propagate back from -the spokes to the hub.
-Hard-coded support.
-The PolicySync component can identify the Policy type and select which extra -status fields are propagated
-If OCMs Policy Framework is updated to enable syncing of resources status back to the hub, it could be an opportunity to refactor the MGC to use this framework in place of the current approach of creating ManifestWorks directly.
-This system could mutate over time to dynamically sync more CRDs than policies to spoke clusters.
- - - - - - - - - - - - - -kuadrant-release-process
Kuadrant is a set of components whose artifacts are built and delivered independently. This RFC aims to define every -aspect of the event of releasing a new version of the whole, in terms of versioning, cadence, communication, channels, -handover to other teams, etc.
-At the time being, there's no clear process nor guidelines to follow when releasing a new version of Kuadrant, which -leads to confusion and lack of transparency. We are currently relying on internal communication and certain people -in charge of the release process, which is not ideal.
-First, we need to define what releasing Kuadrant means, in a clear and transparent way that communicates to the community -what's happening and what to expect. The Kuadrant suite is composed of several components, each of them with its own -set of artifacts and versioning scheme. Defining the release process of the whole suite is a complex task, and it's -not only about the technical details of releasing the components, but also about the communication and transparency -with the community, the definition of the frequency of the releases, and when it's ready to be handover to other teams like -QA. This section aims to provide guidelines for the different aspects of the release process.
-The set of components that are part of the Kuadrant suite are the following:
-Each of them needs to be versioned independently, and the versioning scheme should follow Semantic Versioning. -At the time of cutting a release for any of them, it's important to keep in mind what section of the version to bump, -given a version number MAJOR.MINOR.PATCH, increment the:
-Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.
-A more detailed explanation of the versioning scheme can be found in the Semantic Versioning website.
-By releasing a new version of Kuadrant, we mean releasing the set of components with their corresponding semantic versioning, -some of them maybe freshly released, or others still using versioning from the previous one, and being the version of the -Kuadrant Operator the one that defines the version of the whole suite.
-Kuadrant Suite vx.y.z = Kuadrant Operator vx.y.z + Authorino Operator va.b.c + Limitador Operator vd.e.f + DNS Operator vg.h.i + MGC Controller vj.k.l + Wasm Shim vm.n.o
-
The technical details of how to release each component are out of the scope of this RFC and could be found in the -Kuadrant components CI/CD RFC.
-Probably the most important and currently missing step in the release process is the green flagging from the Quality Assurance -(QA) team. The QA team is responsible for testing the different components of the Kuadrant suite, and they need to -be aware of the new version of the suite that is going to be released, what are the changes that are included, bug fixes -and new features in order they can plan their testing processes accordingly. This check is not meant to be a fully fledged -assessment from the QA team when it's handover to them, it's aimed to not take more than 1-2 days, and ideally expected -to be fully automated. This step will happen once the release candidate has no PRs pending to be merged, and it has been -tested by the Engineering team. The QA team should work closely to the engineering throughout the process, both teams aiming -for zero handover time and continuous delivery mindset, so immediate testing can be triggered on release candidates once -handed over. This process should happen without the need of formal communication between the teams or any overhead in -general, but by keeping constant synergy between quality and product engineering instead.
-There is an ideal time to hand over to the QA team for testing, especially since we are using GitHub for orchestration, -we could briefly define it in the following steps:
-Once the project is stable enough, and its adoption increases, the community will be expecting a certain degree of -commitment from the maintainers, and that includes a regular release cadence. The frequency of the releases of the -different components could vary depending on the particular component needs. However, the Kuadrant Operator -it's been discussed in the past that it should be released every 3-4 weeks initially, including the latest released version -of every component in the suite. There's another RFC that focuses on the actual frequency of each component, one could -refer to the Kuadrant Release Cadence RFC.
-There are a few reasons for this:
-By committing to a release cadence, software projects can benefit from improved efficiency, risk management, faster feedback -cycles, synchronization, and reduced complexity.
-Every component in Kuadrant has its own repository, and the source code is hosted in GitHub, mentioned in the previous -section. However, the images built and manifests generated are hosted in different registries, depending on the -component. The following table shows the different registries used by each component:
-Component | -Artifacts | -Registry / Hub | -
---|---|---|
Authorino | -authorino images | -Quay.io | -
Authorino Operator | -authorino-operator images | -Quay.io | -
- | authorino-operator-bundle images | -Quay.io | -
- | authorino-operator-catalog images | -Quay.io | -
- | authorino-operator manifests | -OperatorHub.io | -
Limitador | -limitador server images | -Quay.io | -
- | limitador crate | -Crates.io | -
Limitador Operator | -limitador-operator images | -Quay.io | -
- | limitador-operator-bundle images | -Quay.io | -
- | limitador-operator-catalog images | -Quay.io | -
- | limitador-operator manifests | -OperatorHub.io | -
Wasm Shim | -wasm-shim images | -Quay.io | -
DNS Operator | -dns-operator images | -Quay.io | -
- | dns-operator-bundle images | -Quay.io | -
- | dns-operator-catalog images | -Quay.io | -
Kuadrant Operator | -kuadrant-operator images | -Quay.io | -
- | kuadrant-operator-bundle images | -Quay.io | -
- | kuadrant-operator-catalog images | -Quay.io | -
- | kuadrant-operator manifests | -OperatorHub.io | -
- | kuadrant-operator source (includes example dashboards and alerts) | -Github Releases | -
kuadrantctl | -kuadrantctl CLI | -Github Releases | -
It's important to note that keeping the documentation up to date is a responsibility of the component maintainers, and -it needs to be done before releasing a new version of the component. The importance of keeping a clear and up-to-date -documentation is crucial for the success of the project.
-The documentation for the Kuadrant suite is compiled and available on the Kuadrant website. One
-can find the source of the documentation within each component repository, in the docs
directory. However, making this
-information available on the website is a manual process, and should be done by the maintainers of the project. The
-process of updating the documentation is simple and consists of the following steps:
Actions > ci > Run Workflow
).Another important aspect of releasing a new version of the Kuadrant suite is the communication with the community and -other teams within the organization. A few examples of the communication channels that need to be updated are:
-The alternative to the proposal is to keep the current process, which is not ideal and leads to confusion and lack of -transparency.
-There's been an organically grown process for releasing new versions of the Kuadrant suite, which is not documented and -it's been changing over time. However, there are some documentation for some of the components, worth mentioning:
-Once the release process is accepted and battle-tested, we could aim to automate the process as much as possible.
- - - - - - - - - - - - - -defaults-and-overrides
This is a proposal for extending the Kuadrant Policy APIs to fully support use cases of Defaults & Overrides (D/O) for Inherited Policies, including the base use cases of full default and full override, and more specific nuances that involve merging individual policy rules (as defaults or overrides), declaring constraints and unsetting defaults.
-As of Kuadrant Operator v0.6.0, Kuadrant policy resources that have hierarchical effect across the tree of network objects (Gateway, HTTPRoute), or what is known as Inherited Policies, provide only limited support for setting defaults and no support for overrides at all.
-The above is notably the case of the AuthPolicy and the RateLimitPolicy v1beta2 APIs, shipped with the aforementioned version of Kuadrant. These kinds of policies can be attached to Gateways or to HTTPRoutes, with cascading effects through the hierarchy that result in one effective policy per gateway-route combination. This effective policy is either the policy attached to the Gateway or, if present, the one attached to the HTTRoute, thus conforming with a strict case of implicit defaults set at the level of the gateway.
-Enhancing the Kuadrant Inherited Policy CRDs, so the corresponding policy instances can declare defaults
and overrides
stanzas, is imperative:
The base use cases for Defaults & Overrides (D/O) are:
-The base cases are expanded with the following additional derivative cases and concepts:
-Together, these concepts relate to solve the following user stories:
-User story | -Group | -Unique ID | -
---|---|---|
As a Platform Engineer, when configuring a Gateway, I want to set a default policy for all routes linked to my Gateway, that can be fully replaced with more specific ones(*). | -D | -gateway-default-policy | -
As a Platform Engineer, when configuring a Gateway, I want to set default policy rules (parts of a policy) for all routes linked to my Gateway, that can be individually replaced and/or expanded by more specific rules(*). | -DR | -gateway-default-policy-rule | -
As a Platform Engineer, when defining a policy that configures a Gateway, I want to set constraints (e.g. minimum/maximum value, enumerated options, etc) for more specific policy rules that are declared(*) with the purpose of replacing the defaults I set for the routes linked to my Gateway. | -C | -policy-constraints | -
As a Platform Engineer, when configuring a Gateway, I want to set a policy for all routes linked to my Gateway, that cannot be replaced nor expanded by more specific ones(*). | -O | -gateway-override-policy | -
As a Platform Engineer, when configuring a Gateway, I want to set policy rules (parts of a policy) for all routes linked to my Gateway, that cannot be individually replaced by more specific ones(*), but only expanded with additional more specific rules(*). | -OR | -gateway-override-policy-rule | -
As an Application Developer, when managing an application, I want to set a policy for my application, that fully replaces any default policy that may exist for the application at the level of the Gateway, without having to know about the existence of the default policy. | -D | -route-replace-policy | -
As an Application Developer, when managing an application, I want to expand a default set of policy rules set for my application at the level of the gateway, without having to refer to those existing rules by name. | -D/O | -route-add-policy-rule | -
As an Application Developer, when managing an application, I want to unset an individual default rule set for my application at the level of the gateway. | -U | -route-unset-policy-rule | -
(*) declared in the past or in the future, by myself or any other authorized user.
-The interactive nature of setting policies at levels in the hierarchy and by different personas, make that the following additional user stories arise. These are stories here grouped under the Observability (Ob) aspect of D/O, but referred to as well in relation to the "Discoverability Problem" described by Gateway API.
-User story | -Group | -Unique ID | -
---|---|---|
As one who has read access to Kuadrant policies, I want to view the effective policy enforced at the traffic routed to an application, considering all active defaults and overrides at different policies(*). | -Ob | -view-effective-policy | -
As a Platform Engineer, I want to view all default policy rules that may have been replaced by more specific ones(*). | -Ob | -view-policy-rule-status | -
As a Policy Manager, I want to view all gateways and/or routes whose traffic is subject to enforcement of a particular policy rule referred by name. | -Ob | -view-policy-rule-reach | -
(*) declared in the past or in the future, by myself or any other authorized user.
-Writing a Kuadrant policy enabled for Defaults & Overrides (D/O), to be attached to a network object, involves declaring the following fields at the first level of the spec:
-targetRef
(required): the reference to a hierarchical network object targeted by the policy, typed as a Gateway API PolicyTargetReference
or PolicyTargetReferenceWithSectionName
typedefaults
: a block of default policy rules with further specification of a strategy (atomic set of rules or individual rules to be merged into lower policies), and optional conditions for applying the defaults down through the hierarchyoverrides
: a block of override policy rules with further specification of a strategy (atomic set of rules or individual rules to be merged into lower policies), and optional conditions for applying the overrides down through the hierarchyrules
field in a Kuadrant AuthPolicy, the limits
field in a RateLimitPolicy.Between the following mutually exclusive options, either one or the other shall be used in a policy:
-defaults
and/or overrides
blocks; orIn case the bare set of policy rules is used, it is treated implicitly as a block of defaults.
-Supporting specifying the bare set of policy rules at the first level of the spec, alternatively to the defaults
and overrides
blocks, is a strategy that aims to provide:
A policy that does not specify D/O fields (defaults
, overrides
) is a policy that declares an intent.
One who writes a policy without specifying defaults
or overrides
, but only the bare set of policy rules, may feel like declaring a Direct Policy.
-Depending on the state of other policies indirectly affecting the same object or not, the final outcome can be the same as writing a direct policy.
-This is especially true when the policy that declares the intent targets an object whose kind is the lowest kind accepted by Kuadrant in the hierarchy of network resources, and there are no other policies with lower precedence.
Nevertheless, because other policies can affect the final behavior of the target (e.g. by injecting defaults, by overriding rules, by adding more definitions beneath), policies that simply declare an intent, conceptually, are still Inherited Policies.
-Compared to the inherited policy that misses D/O blocks, these other policies affecting the behavior may be declared:
-At any time, any one of these policies can be created and therefore the final behavior of a target should never be assumed to be equivalent to the intent declared by any individual policy in particular, but always collectively determined by the combination of all intents, defaults and overrides from all inherited policies affecting the target.
-From GEP-2649:
---If a Policy can be used as an Inherited Policy, it MUST be treated as an Inherited Policy, regardless of whether a specific instance of the Policy is only affecting a single object.
-
An inherited policy that simply declares an intent (i.e. without specifying D/O) will be treated as a policy that implicitly declares an atomic set of defaults, whether the policy targets higher levels in the hierarchy or lower ones. -In the absence of any other conflicting policy affecting the same target, the behavior equals the defaults which equal the intent.
-A policy that specifies D/O fields (defaults
, overrides
) is a policy explicitly declared to modify an intent.
Without any other policy with lower precedence, there is no special meaning in choosing whether defaults or overrides in a inherited policy that targets an object whose kind is the lowest kind accepted by Kuadrant in the hierarchy of network resources. -The sets of rules specified in these policies affect indistinctively the targeted objects, regardless of how they are qualified.
-However, because other policies may ocasionally be declared with lower precedence (i.e. targeting lower levels in the hierarchy or due to ordering, see Conflict Resolution), one who declares a policy to modify an intent must carefuly choose between defaults
and/or overrides
blocks to organize the policy rules, regardless if the targeted object is of a kind that is the lowest kind in the hierarchy of network resources accepted by Kuadrant.
Even in the cases where no more than one policy of a kind is allowed to target a same object (1:1 relationship) and thus there should never exist two policies affecting a target from a same level of the hierarchy simultaneaously (or equivalently a policy with lower precedence than another, both at the lowest level of the hierarchy), users must assume that this constraint may change (i.e. N:1 relationship between policies of a kind and target become allowed.)
-In all cases, defaults and overrides must be used with the semantics of declaring rules that modify an intent.
-All Custom Resource Definitions (CRDs) that define a Kuadrant inherited policy must be labeled gateway.networking.k8s.io/policy: inherited
.
Users can rely on the presence of that label to identify policy kinds whose instances are treated as inhertied policies.
-In some exceptional cases, there may be kinds of Kuadrant policies that do not specify defaults
and overrides
blocks, but that are still labeled as inherited policy kinds.
-Instances of these kinds of policies implicitly declare an atomic sets of defaults, similarly to described in Inherited Policies that declare an intent.
Example 1. Atomic defaults
-kind: AuthPolicy
-metadata:
- name: gw-policy
-spec:
- targetRef:
- kind: Gateway
- defaults:
- rules:
- authentication:
- "a": {…}
- authorization:
- "b": {…}
- strategy: atomic
-
The above is a proper Inherited Policy that sets a default atomic set of auth rules that will be set at lower objects in case those lower object do not have policies attached of their own at all.
-The following is a sligthly different example that defines auth rules that will be individually merged into lower objects, evaluated one by one if already defined at the "lower" (more specific) level and therefore should take precedence, or if otherwise is missing at the lower level and therefore the default should be activated.
-Example 2. Merged defaults
-kind: AuthPolicy
-metadata:
- name: gw-policy
-spec:
- targetRef:
- kind: Gateway
- defaults:
- rules:
- authentication:
- "a": {…}
- authorization:
- "b": {…}
- strategy: merge
-
Similarly, a set of overrides
policy rules could be specified, instead or alongside with the defaults
set of policy rules.
There are 2 supported strategies for applying proper Inherited Policies down to the lower levels of the herarchy:
-defaults
or overrides
block is applied as an atomic piece; i.e., a lower object than the target of the policy, that is evaluated to be potentially affected by the policy, also has an atomic set of rules if another policy is attached to this object, therefore either the entire set of rules declared by the higher (less specific) policy is taken or the entire set of rules declared by the lower (more specific) policy is taken (depending if it's defaults
or overrides
), but the two sets are never merged into one.defaults
or overrides
block is compared one to one against lower level policy rules and, when they conflict (i.e. have the same key with different values), either one or the other (more specific or less specific) is taken (depending if it's defaults
or overrides
), in a way that the final effective policy is a merge between the two policies.Each block of defaults
and overrides
must specify a strategy
field whose value is set to either atomic
or merge
. If omitted, atomic
is assumed.
Atomic versus merge strategies, as a specification of the defaults
and overrides
blocks, imply that there are only two levels of granularity for comparing policies vis-a-vis.
atomic
means that the level of granularity is the entire set of policy rules within the defaults
or overrides
block. I.e., the policy is atomic, or, equivalently, the final effective policy will be either one indivisible ("atomic") set of rules ("policy") or the other.
For the merge
strategy, on the other hand, the granularity is of each named policy rule, where the name of the policy rule is the key and the value is an atomic object that specifies that policy rule. The final effective policy will be a merge of two policies.
When two policies are compared to compute a so-called Effective Policy out of their sets of policy rules and given default or override semantics, plus specified atomic
or merge
strategies, the following matrix applies:
- | Atomic (entire sets of rules) | -Merge (individual policy rules at a given granularity) | -
---|---|---|
Defaults | -More specific entire set of rules beats less specific entire set of rules → takes all the rules from the lower policy | -More specific individual policy rules beat less specific individual set of rules → compare one by one each pair of policy rules and take the lower one if they conflict | -
Overrides | -Less specific entire set of rules beats more specific entire set of rules → takes all the rules from the higher policy | -Less specific individual policy rules beat more specific individual set of rules → compare one by one each pair of policy rules and take the higher one if they conflict | -
The order of the policies, from less specific (or "higher") to more specific (or "lower), is determined according to the Gateway API hierarchy of network resources, based on the kind of the object targeted by the policy. The policy that sets higher in the hierarchy dictates the strategy to be applied.
-For a more detailed reference, including how to resolve conflicts in case of policies targeting objects at the same level, see GEP-713's section Hierarchy and Conflict Resolution.
-In some cases, it may be desirable to be able to unset, at a lower policy, a merged default that is inherited from a higher one. In fact, some inherited defaults could be harmful to an application, at the same time as they are unfeasible to remove from scope for all applications altogether, and therefore require an exception.
-Unsetting defaults via specification at lower level policies provides users who own policy rules at different levels of the hirarchy the option of not having to coordinate those exceptions "offline", nor having to accept the addition of special cases (conditions) at the higher level to exempt only specific lower policies from being affected by a particular default, which otherwise would configure a violation of the inheritance pattern, as well as an imposition of additional cognitive complexity for one who reads a higher policy with too many conditions.
-Instead, users should continue to be able to declare their intents through policies, and redeem an entitlement to unset unapplicable defaults, without any leakage of lower level details upwards at the higher policies.
-The option of unsetting inherited defaults is presented as part of the volition implied by the inheritance of policy rules, which are tipically specified for the more general case (e.g. at the level of a gateway, for all routes), though not necessarily applicable for all special cases beneath. If enabled, this feature helps disambiguate the concept of "default", which should not be understood strictly as the option to set values that protect the system in case of lack of specialisation. Rather, by its property of volition and changeability. I.e., by definition, every default policy rule is opt-out and specifies a value that is modifiable.
-In constrast, a policy rule that is neither opt-out nor modifiable better fits the definition of an override. While a policy rule that is not opt-out, nor it sets a concrete default value to be enforced in the lack of specialisation, defines a requirement.
-Finally, for the use case where users want to set defaults that cannot be unset (though still modifable), the very feature of unsetting defaults itself should be configurable, at least at the level of the system. This can be achieved with feature switches and policy validation, including backed by the cluster's RBAC if needed.
-The capability of unsetting inherited defaults from an effective policy can be identified by the presence of the spec.unset
field in a policy. The value is a list of default named policy rules to be unset.
Users should be able to specify conditions for applying their blocks of defaults
and overrides
. These conditions aim to support exceptional cases where the blocks cannot be simply applied downwards, but rather depend on specifics found in the lower policies, while still defined in generic terms – as opposed to conditions that leak details of individual lower policies upwards.
Between a higher and a lower set of policy rules, the higher level dictates the conditions for its rules to be applied (either as defaults or as overrides) over the lower level, and never the other way around.
-D/O conditions are identfied by the presence of the spec.defaults.when
or spec.overrides.when
fields in a policy. Those should be defined using Common Expression Language (CEL), evaluated in the control plane against the lower level specification that the higher level is being applied to. I.e. self
in the CEL expression is the lower policy.
A concrete useful application for conditionally enforcing a block of D/O is for specifying constraints for lower values. E.g. if a lower policy tries to set a value on a numeric field that is greater (or lower) than a given threshold, apply an override that sets that field value to equal to the threshold; otherwise, use the value declared by the lower policy.
-In contrast, an example of trivially redundant application of D/O conditions would be specifying a default block of rules that is only applied when the lower level does not declare a more specific replacement. Since this is natural semantics of a default, one does not have to use conditions for that.
-The following sets of examples generalize D/O applications for the presented user stories, regardless of details about specific personas and kinds of targeted resources. They illustrate the expected behavior for different cases involving defaults, overrides, constraints and unsetting.
-Examples | -Highlighted user stories | -
---|---|
A. Default policy entirely replaced by another at lower level | -gateway-default-policy, route-replace-policy | -
B. Default policy rules merged into policies at lower level | -gateway-default-policy-rule, route-add-policy-rule | -
C. Override policy entirely replacing other at lower level | -gateway-override-policy | -
D. Override policy rules merged into other at lower level | -gateway-override-policy-rule | -
E. Override policy rules setting constraints to other at lower level | -policy-constraints | -
F. Policy rule that unsets a default from higher level | -route-unset-policy-rule | -
In all the examples, a Gateway and a HTTPRoute objects are targeted by two policies, and an effective policy is presented highlighting the expected outcome. This poses no harm to generalizations involving same or different kinds of targeted resources, multiples policies targeting a same object, etc.
-The leftmost YAML is always the "higher" (less specific) policy; the one in the middle, separated from the leftmost one by a "+" sign, is the "lower" (more specific) policy; and the rightmost YAML is the expected Effective Policy.
-For a complete reference of the order of hierarchy, from least specific to most specific kinds of resources, as well as how to resolve conflicts of hierarchy in case of policies targeting objects at the same level, see Gateway API's Hierarchy definition for Policy Attachment and Conflict Resolution.
-Example A1. A default policy that is replaced entirely if another one is set at a lower level
- -Example B1. A default policy whose rules are merged into other policies at a lower level, where individual default policy rules can be overridden or unset - without conflict
- -Example B2. A default policy whose rules are merged into other policies at a lower level, where individual default policy rules can be overridden or unset - with conflict
- -Example C1. An override policy that replaces any other that is set at a lower level entirely
- -Example D1. An override policy whose rules are merged into other policies at a lower level, overriding individual policy rules with same identification - without conflict
- -Example D2. An override policy whose rules are merged into other policies at a lower level, overriding individual policy rules with same identification - with conflict
- -The examples in this section introduce the proposal for a new when
field for the defaults
and overrides
blocks. This field dictates the conditions to be found in a lower policy that would make a higher policy or policy rule to apply, according to the corresponding defaults
or overrides
semantics and atomic
or merge
strategy.
Combined with a simple case of override policy (see Examples C), the when
condition field allows modeling for use cases of setting constraints for lower-level policies.
As here proposed, the value of the when
condition field must be a valid Common Expression Language (CEL) expression.
Example E1. An override policy whose rules set constraints to field values of other policies at a lower level, overriding individual policy values of rules with same identification if those values violate the constraints - lower policy is compliant with the constraint
- -Example E2. An override policy whose rules set constraints to field values of other policies at a lower level, overriding individual policy values of rules with same identification if those values violate the constraints - lower level violates the constraint
- -Example E3. An override policy whose rules set constraints to field values of other policies at a lower level, overriding individual policy values of rules with same identification if those values violate the constraints - merge granularity problem
-The following example illustrates the possibly unintended consequences of enforcing D/O at strict levels of granularity, and the flip side of the strategy
field offering a closed set of options (atomic
, merge
).
On one hand, the API is simple and straightforward, and there are no deeper side effects to be concerned about, other than at the two levels provided (atomic sets or merged individual policy rules.) On the other hand, this design may require more offline interaction between the actors who manage conflicting policies.
- -The examples in this section introduce a new field unset: []string
at the same level as the bare set of policy rules. The value of this field, provided as a list, dictates the default policy rules declared at a higher level to be removed ("unset") from the effective policy, specified by name of the policy rules.
Example F1. A policy that unsets a default policy rule set at a higher level
- -Example F2. A policy that tries to unset an override policy rule set a higher level
- -An inherited policy can be at any of the following conditions (RFC 0004):
-Type | -Status | -Reason | -Message | -
---|---|---|---|
Accepted | -True | -"Accepted" | -"Policy has been accepted" | -
- | False | -"Conflicted" | -"Policy is conflicted by <policy-ns/policy-name>" | -
- | False | -"Invalid" | -"Policy is invalid" | -
- | False | -"TargetNotFound" | -"Policy target <resource-name> was not found" | -
Enforced | -True | -"Enforced" | -"Policy has been successfuly enforced[. The following defaults have been added by |
-
- | True | -"PartiallyEnforced" | -"Policy has been successfuly enforced. The following rules have been overridden by |
-
- | False | -"Overridden" | -"Policy has been overridden by <policy-ns/policy-name>" | -
- | False | -"Unknown" | -"Policy has encountered some issues" | -
A special condition must be added to every object that is targeted by a Kuadrant inherited policy if the policy's Enforced
status condition is True
.
This special condition to be added to the target object is kuadrant.io/xPolicyAffected
, where "xPolicy" is the kind of the inherited policy (e.g. AuthPolicy, RateLimitPolicy.)
The possible statuses of an object regarding its sensitivity to one or more inherited policies are:
-Type | -Status | -Reason | -Message | -
---|---|---|---|
xPolicyAffected | -False | -"Unaffected" | -"The object is not affected by any xPolicy" | -
- | True | -"Affected" | -"The object is affected by xPolicy <policy-ns/policy-name>" | -
- | True | -"PartiallyAffected" | -"The following sections of the object are affected by xPolicy <policy-ns/policy-name>: rules.0, rules.2" | -
The presence of the PolicyAffected
status condition helps identify that an object is sensitive to one of more policies of a kind, and gives some specifics about the scope of that effect (entire object or selected sections.)
-In many cases, this should be enough for inferring the actual policy rules being enforced for that object.
For other cases where any of the following situations hold, a more detailed view of the final Effective Policy must be provided to the user:
-To help visualize the effective policy for a given target object in that situation, at least one of the following options must be provided to the user:
-EffectivePolicy
custom resource, defined for each kind of inherited policy, and with an instance created for each affected object, that is reconciled and updated by the policy controller.The following diagrams are a high level model to guide the process of applying a set of policies of a kind for a given Gateway object, where the Gateway object is considered the root of a hierarchy, and for all objects beneath, being the xRoute objects the leaves of the hierarchical tree.
-As presented, policies can target either Gateways of route objects (HTTPRoutes, GRPCRoutes), with no restriction regarding the number of policies of a kind that target a same particular object. I.e. N:1 relationship allowed. Without any loss of generality, 1:1 relationship between policies of a kind and targeted objects can be imposed if preferred as a measure to initially reduce the blast of information for the user and corresponding cognitive load.
-%%{ init: { "theme": "neutral" } }%%
-flowchart LR
- start([For a Gateway <i>g</i><br>and policy kind <i>pk</i>]) -->
- list-routes[List all routes<br>accepted by <i>g</i> as <i>R</i>] -->
- apply-policies-for-r
- subgraph for-each-route[For each <i>r in R</i>]
- apply-policies-for-r[[Apply policies<br>of kind <i>pk</i><br>scoped for <i>r</i>]] -->
- apply-policies-for-r
- end
- for-each-route -->
- build-virtual-route[Build a virtual route <i>vr</i><br>with all route rules not<br>target by any policy] -->
- apply-policies-for-vr[[Apply policies<br>of kind <i>pk</i><br>scoped for <i>vr</i>]] -->
- finish(((END)))
-%%{ init: { "theme": "neutral" } }%%
-flowchart LR
- apply-policies-for-o-start([Apply policies of kind <i>pk</i><br>scoped for an object <i>o</i>]) -->
- list-policies[Make <i>P</i> ← all policies <br>of kind <i>pk</i> that<br>affect <i>o</i>] -->
- sort-policies[Sort <i>P</i> from<br>lowest to highest] -->
- build-effective-policy[Build an effective<br>policy <i>ep</i> without<br>any policy rules] -->
- merge-p-into-ep
- subgraph for-each-policy[For each policy <i>p in P</i>]
- merge-p-into-ep[[Merge <i>p into <i>ep</i>]] -->
- merge-p-into-ep
- end
- for-each-policy -->
- reconcile-ep[Reconcile resources<br>for <i>ep</i>] -->
- apply-policies-for-o-finish(((END)))
-%%{ init: { "theme": "neutral" } }%%
-flowchart LR
- merge-p1-into-p2-start([Merge policy <i>p1</i><br>into policy <i>p2</i>]) -->
- p1-format{Explicit<br><i>defaults</i> or <i>overrides</i><br>declared in <i>p1</i>?}
- p1-format -- Yes --> merge-defaults-for-r[["Merge <b>defaults</b> block<br>of policy rules<br>of <i>p1</i> into <i>p2</i>"]] --> merge-overrides-for-r[["Merge <b>overrides</b> block<br>of policy rules<br>of <i>p1</i> into <i>p2</i>"]] --> merge-p1-into-p2-finish(((Return <i>p2</i>)))
- p1-format -- No --> merge-bare-rules-for-r[["Merge ungrouped<br>block of policy rules<br>of <i>p1</i> into <i>p2</i><br>(as <b>defaults</b>)"]] --> merge-p1-into-p2-finish
-%%{ init: { "theme": "neutral" } }%%
-flowchart LR
- merge-block-of-rules-into-p-start([Merge block of<br>policy rules <i>B</i><br>into policy <i>p</i>]) -->
- r-conditions-match{"<i>B.when(p)</i>"}
- r-conditions-match -- "Conditions do not match" --> merge-block-of-rules-into-p-finish(((Return <i>p</i>)))
- r-conditions-match -- "Conditions match" --> block-semantics{Merge <i>B</i> as}
- block-semantics -- "Defaults" --> merge-default-block-into-p[[Merge default block<br>of policy rules <i>B</i><br>into policy <i>p</i>]] --> merge-block-of-rules-into-p-finish
- block-semantics -- "Overrides" --> merge-override-block-into-p[[Merge override block<br>of policy rules <i>B</i><br>into policy <i>p</i>]] --> merge-block-of-rules-into-p-finish
-defaults
block of policy rules into a policy¶%%{ init: { "theme": "neutral" } }%%
-flowchart LR
- merge-default-block-into-p-start([Merge default block<br>of policy rules <i>B</i><br>into policy <i>p</i>]) -->
- unset-unwanted-policy-rules[Remove from <i>B</i><br>all policy rules<br>listed in <i>p.unset</i>] -->
- p-empty{<i>p.empty?</i>}
- p-empty -- "Yes" --> full-replace-p-with-defaut-block[<i>p.rules ← B</i>] --> merge-default-block-into-p-finish(((Return <i>p</i>)))
- p-empty -- "No" --> default-block-strategy{<i>B.strategy</i>}
- default-block-strategy -- "Atomic" --> merge-default-block-into-p-finish
- default-block-strategy -- "Merge" --> default-p-r-exists
- subgraph for-each-default-policy-rule["For each <i>r in B<i>"]
- default-p-r-exists{"<i>p[r.id].exists?</i>"}
- default-p-r-exists -- "Yes" --> default-p-r-exists
- default-p-r-exists -- "No" --> default-replace-pr["<i>p[r.id] ← r</i>"] --> default-p-r-exists
- end
- for-each-default-policy-rule -->
- merge-default-block-into-p-finish
-overrides
block of policy rules into a policy¶%%{ init: { "theme": "neutral" } }%%
-flowchart LR
- merge-override-block-into-p-start([Merge override block<br>of policy rules <i>B</i><br>into policy <i>p</i>]) -->
- override-block-strategy{<i>B.strategy</i>}
- override-block-strategy -- "Atomic" --> full-replace-p-with-override-block[<i>p.rules ← B</i>] --> merge-override-block-into-p-finish(((Return <i>p</i>)))
- override-block-strategy -- "Merge" --> override-replace-pr
- subgraph for-each-override-policy-rule["For each <i>r in B<i>"]
- override-replace-pr["<i>p[r.id] ← r</i>"] --> override-replace-pr
- end
- for-each-override-policy-rule -->
- merge-override-block-into-p-finish
-This section proposes a possible path for the implementation of this RFC for Kuadrant's existing kinds of policies that are affected by D/O – notably AuthPolicy and RateLimitPolicy.
-The path is divided in 3 tiers that could be delivered in steps, additionaly to a series of enhancements & refactoring.
-defaults
field to the APIs)gateway.networking.k8s.io/policy: inherited | direct
when
conditions (and support for "constraints")unset
)See Mutually exclusive API designs > Design option: strategy
field.
The following alternatives were considered for the design of the API spec to support D/O:
-strategy
field - RECOMMENDEDgranularity
fieldwhen
conditions (at any level of the spec)All the examples in the RFC are based on API design strategy
field.
strategy
field¶Each block of defaults
and overrides
specify a field strategy: atomic | merge
, with atomic
assumed if the field is omitted.
All the examples in the RFC are based on this design for the API spec.
-Some of the implications of the design are explained in the section Atomic vs. individually merged policy rules, with highlights to the support for specifying the level of atomicity of the rules in the policy based on only 2 granularities – entire set of policy rules (atomic
) or to the level of each named policy rule (merge
.)
✅ Pros | -❌ Cons | -
---|---|
-
|
-
-
|
-
The design option based on the strategy
field is the RECOMMENDED design for the implementation of Kuadrant Policies enabled for D/O. This is due to the pros above, plus the fact that this design can evolve to other, more versatile forms, such as granularity
field, when
conditions or CEL functions, in the future, while the opposite would be harder to achieve.
granularity
field¶Each block of defaults
and overrides
would specify a granularity
field, set to a numeric integer value that describes which level of the policy spec, from the root of the set of policy rules until that number of levels down, to treat as the key, and the rest as the atomic value.
Example:
-kind: AuthPolicy
-metadata:
- name: gw-policy
-spec:
- targetRef:
- kind: Gateway
- defaults:
- rules:
- authentication:
- "a": {…}
- authorization:
- "b": {…}
- granularity: 0 # the entire spec ("rules") is an atomic value
- overrides:
- rules:
- metadata:
- "c": {…}
- response:
- "d": {…}
- granularity: 2 # each policy rule ("c", "d") is an atomic value
-
✅ Pros | -❌ Cons | -
---|---|
-
|
-
-
|
-
when
conditions (at any level of the spec)¶Inspired by the extension of the API for D/O with an additional when
field (see Examples E), this design alternative would use the presence of this field to signal the granularity of the atomic operation of default or override.
Example:
-kind: AuthPolicy
-metadata:
- name: gw-policy
-spec:
- targetRef:
- kind: Gateway
- defaults:
- rules:
- authentication:
- "a": {…}
- when: CEL # level 1 - entire "authentication" block
- authorization:
- "b":
- "prop-1": {…}
- when: CEL # level 2 - "b" authorization policy rule
-
✅ Pros | -❌ Cons | -
---|---|
-
|
-
-
|
-
This design option leans on the power of Common Expression Language (CEL), extrapolating the design alternative with when
conditions beyond declaring a CEL expression just to determine if a statically declared value should apply. Rather, it proposes the use of CEL functions that outputs the value to default to or to ovrride with, taking the conflicting "lower" value as input, with or without a condition as part of the CEL expression. The value of a key set to a CEL function indicates the level of granularity of the D/O operation.
Example:
-kind: AuthPolicy
-metadata:
- name: gw-policy
-spec:
- targetRef:
- kind: Gateway
- defaults:
- rules:
- authentication:
- "a": {…} # static value
- "b": "cel:self.value > 3 ? AuthenticationRule{value: 3} : self"
- authorization: |
- cel:Authorization{
- c: AuthorizationRule{prop1: "x"}
- }
-
✅ Pros | -❌ Cons | -
---|---|
-
|
-
-
|
-
A more radical alternative considered consisted of defining defaults
and overrides
blocks whose schemas would not match the ones of a normal policy without D/O. Instead, these blocks would consist of simple key-value pairs, where the keys specify the paths in an affected policy where to apply the value atomically.
Example:
-kind: AuthPolicy
-metadata:
- name: gw-policy
-spec:
- targetRef:
- kind: Gateway
- defaults:
- "rules.authentication":
- "a": {G}
- "rules.authorization.b": {G}
-
✅ Pros | -❌ Cons | -
---|---|
-
|
-
-
|
-
Similar to the path-keys design option, inspired by JSON patch operations, to provide more kinds of operations and extensibility.
-Example:
-kind: AuthPolicy
-metadata:
- name: gw-policy
-spec:
- targetRef:
- kind: Gateway
- defaults:
-
- - path: rules.authentication
- operation: add
- value: { "a": {G} }
- - path: rules.authorization.b
- operation: remove
- - path: |
- rules.authentication.a.
- value
- operation: le
- value: 50
-
✅ Pros | -❌ Cons | -
---|---|
-
|
-
-
|
-
Other than the primitive support only for implicit atomic defaults provided by Kuadrant for the AuthPolicy and RateLimitPolicy, other real-life implementations of D/O along the lines proposed by Gateway API are currently unknown.
-Some orientative examples provided in:
-gwctl
effective policy calculation for inherited policies - see policy manager's merge test cases.A use case often described in association with D/O is the one for declaring policy requirements. These are high level policies that declare requirements to be fulfilled by more specific (lower level) policies without specifying concrete default or override values nor constraints. E.g.: "an authentication policy must be enforced, but none is provided by default."
-A typical generic policy requirement user story is:
---As a Platform Engineer, when configuring a Gateway, I want to set policy requirements to be fulfilled by one who manages an application/route linked to my Gateway, so all interested parties, including myself, can be aware of applications deployed to the cluster that lack a particular policy protection being enforced.
-
Policy requirements as here described are out of scope of this RFC.
-We believe policy requirement use cases can be stated and solved as an observability problem, by defining metrics and alerts that cover for missing policies or policy rules, without necessarily having to write a policy of the same kind to express such requirement to be fulfilled.
-How to handle merges of policies from different namespaces that contain references to other objects (e.g. Secrets)?
-Often policies rules include references to other Kubernetes objects, such as Secrets, typically defined in the same namespace as the policy object. When merging policies from different namespaces, these references need to be taken into account.
-If not carried along with the derivative resources (e.g. Authorino AuthConfig objects) that are created from a merge of policies (or from the computed effective policy), composed out of definitions from different namespaces, and that depend on those references, these references to external objects can be broken.
-This is not much of a problem for atomic D/O only, as the derivative objects that depend on the references could be forced to be created in the same namespace as the policy that wins against all the others – and therefore in the same namespace of the winning referents as well. However, when merging policies, we can run into a situation where final effective policies (thus also other derivative resources) contain references to objects inherited from definitions from other namespaces.
-Possible solutions to this problem include:
-Should Kuadrant's inherited policy specs resemble more the specs of the objects they target?
-The UX for one who writes a Kuadrant policy of the inherited class of policies is arguably not very different from writing any custom resource that happens to specify a targetRef
field.
-Other than name and kind of target object, there is nothing much in a Kuadrant policy custom resource that provides the user with an experience almost close to be "adding fields" in the target object.
With the exception of a few types reused for the route selectors, the spec of a Kuadrant policy is very different from the spec of the object that ultimately the policy augments, i.e. the spec of the route object. This remains basically unchanged after this RFC. However, another way to think on the design of those APIs is one where, in contrast, the specs of the policies partially mirror the spec of the route, so users can write policies in a more intuitive fashion, as if the definitions of the policy would look like extensions of the routes they target (directly or by targeting gateways the routes are attached to.)
-E.g.:
-kind: HTTPRoute
-metadata:
- name: my-route
-spec:
- rules:
-
- - name: rule-1
- matches:
- - method: GET
- backendRef: {…}
- - name: rule-2
- backendRef: {…}
-
An inherited policy that targets the HTTPRoute above could otherwise look like the following:
-kind: Policy
-metadata:
- name: my-policy
-spec:
- targetRef:
- kind: HTTPRoute
- name: my-route
- defaults: # mirrors the spec of the httproute object
- policySpecificDef: {…} # augments the entire httproute object
- overrides: # mirrors the spec of the httproute object
- rules:
-
- - name: rule-2
- policySpecificDef: {…} # augments only httprouterule rule-2 of the httproute object
-
The above already is somewhat closer to being true for the AuthPolicy API, than it is for the RateLimitPolicy one. However, that is strictly coincidental, because the AuthPolicy's spec happens to specify a rules
field, where the equivalent at the same level in RateLimitPolicy is called limits
.
This alternative design could make writing policies more like defining filters in an HTTPRoute, with the difference that policies are external to the target they extend (while filters are internal.) At the same time, it could be a replacement for Kuadrant route selectors, where the context of applicability of a policy rule is given by the very structure within the spec how the policy rule is declared (resembling the one of the target), thus also would shaping context for D/O.
-One caveat of this design though is that each policy specific definition (i.e. the rule specification that extends the object at a given point defined by the very structure of the spec) is exclusive of that given point in the structure of the object. I.e., one cannot specify a single policy rule that augments N > 1 specific rules of a target HTTPRoute.
-Due to its relevance to the design of the API that enables D/O, this was left as an unresolved question. To be nonetheless noticed that, as a pattern, this alternative API design extends beyond inherited policies, impacting as well the direct policy kinds DNSPolicy and TLSPolicy.
-Although this proposal was thought to keep options open for multiple policies of a kind targeting a same network resource, this is currently not the state of things for Kuadrant. Instead, Kuadrant enforces 1:1 relationship between policies of a kind and target resources.
-Supporting N:1 relationships could enable use cases such as of App Developers defining D/O for each other at the same level of a shared xRoute, as well as Platform Engineers setting different policy rules on the same Gateway.
-This could provide an alternative to achieving separation of concerns for complex policy kinds such as the AuthPolicy, where different users could be responsible for authentication and authorization, without necessarily depending on defining new kinds of policies.
-name
and targetRef.sectionName
¶If Gateway API's GEP-995 is accepted (i.e. kubernetes-sigs/gateway-api#2593 gets merged) and the name
field for route rules implemented in the APIs (HTTPRoute and GRPCRoute), this could impact how Kuadrant delivers D/O. Although the semantics could remain the same, the users specify the scope for a given set of policy rules could simplify significantly.
As of today, Kuadrant's AuthPolicy and RateLimitPolicy APIs allow users to target sections of a HTTPRoute based on route selectors, and thus all the conflict resolution involved in handling D/O must take that logics into account.
-With named route rules supported by Gateway API, either route selectors could be redefined in a simpler form where each selector consists of a list of names of rules and/or entire policies could be scoped for a section of a resource, by defining the targetRef
field based on the PolicyTargetReferenceWithSectionName
type.
To be noted GEP-2649's recommendation of not defining inherited policies that allow for sectionName
in the targetRef
. Nonetheless, this is a general rule from the spec said to be acceptable to be broken in the spirit of offering better functionality to users, provided it can deal with the associated discoverability and complexity problems of this feature.
Despite having recently modified the AuthPolicy and RateLimitPolicy APIs to use maps for declaring policy rules instead of lists (RFC 0001), reverting this design in future versions of these APIs, plus treating those lists as listMapType
, could let us leverage the API server's strategic merge type to handle merges between policy objects.
In the Policy CRDs, the policy rule types must specify a name
field (required). The list of rules type (i.e. []Rule
) must then speficy the following Kubebuilder CRD processing annotations:
As the time of writing, GEP-713 (Kubernetes Gateway API, SIG-NETWORK) is under revision, expected to be split into two separate GEPs, one for Direct Policies (GEP-2648) and one for Inherited Policies (GEP-2649.) Once these new GEPs supersede GEP-713, all references to the previous GEP in this document must be updated to GEP-2649. ↩
-A new sub-project for a prometheus exporter that exports metrics about the state of Gateway API resources in a Kubernetes cluster.
-Allow additional stateful information about Gateway API resources to be made available via metrics. -Currently a set of metrics are made available via the gateway-api-state-metrics project. -However, there are limitations with what resource information can be exposed using the underlying kube-state-metrics project. -Additional stateful information would include:
-For example, the individual status listener information from a Gateway:
- status:
- listeners:
-
- - attachedRoutes: 1
- conditions:
- - lastTransitionTime: "2023-08-15T13:22:06Z"
- message: No errors found
- observedGeneration: 1
- reason: Ready
- status: "True"
- type: Ready
- - lastTransitionTime: "2023-08-15T13:22:06Z"
- message: No errors found
- observedGeneration: 1
- reason: ResolvedRefs
- status: "True"
- type: ResolvedRefs
- name: api
-
and HTTPRoute parents status conditions:
- status:
- parents:
-
- - conditions:
- - lastTransitionTime: "2024-05-16T16:17:38Z"
- message: Object affected by AuthPolicy default/toystore
- observedGeneration: 1
- reason: Accepted
- status: "True"
- type: kuadrant.io/AuthPolicyAffected
- - lastTransitionTime: "2024-05-16T16:18:51Z"
- message: Object affected by RateLimitPolicy default/toystore
- observedGeneration: 1
- reason: Accepted
- status: "True"
- type: kuadrant.io/RateLimitPolicyAffected
- controllerName: kuadrant.io/policy-controller
- parentRef:
- group: gateway.networking.k8s.io
- kind: Gateway
- name: api-gateway
- namespace: kuadrant-system
- - conditions:
- - lastTransitionTime: "2024-05-16T16:17:38Z"
- message: Object affected by AuthPolicy default/toystore
- observedGeneration: 1
- reason: Accepted
- status: "True"
- type: kuadrant.io/AuthPolicyAffected
- - lastTransitionTime: "2024-05-16T16:18:51Z"
- message: Object affected by RateLimitPolicy default/toystore
- observedGeneration: 1
- reason: Accepted
- status: "True"
- type: kuadrant.io/RateLimitPolicyAffected
- - lastTransitionTime: "2024-05-20T11:45:33Z"
- message: Route was valid
- observedGeneration: 1
- reason: Accepted
- status: "True"
- type: Accepted
- - lastTransitionTime: "2024-05-20T11:45:33Z"
- message: All references resolved
- observedGeneration: 1
- reason: ResolvedRefs
- status: "True"
- type: ResolvedRefs
- controllerName: istio.io/gateway-controller
- parentRef:
- group: gateway.networking.k8s.io
- kind: Gateway
- name: api-gateway
- namespace: kuadrant-system
-
First, implement the existing Gateway API metrics that are part of the gateway-api-state-metrics project. -This will allow the new prometheus exporter to replace that project as a whole. -The metrics will be backwards compatible with the gateway-api-state-metrics project, with the addition of new metrics.
-Second, implement new metrics, as per the examples below, to capture the additional status information:
-gatewayapi_gateway_status_listeners_conditions{namespace="<NAMESPACE>",name="<GATEWAY>",listener_name="<LISTENER_NAME>",type="<ResolvedRefs|Ready|Other>} 1
-
gatewayapi_httproute_status_parents_conditions{namespace="<NAMESPACE>",name="<GATEWAY>",controller_name="<CONTROLLER_NAME>","parent_group="<PARENT_GROUP>",parent_kind="<PARENT_KIND>",parent_name="<PARENT_NAME>",parent_namespace="<PARENT_NAMESPACE>",type="<ResolvedRefs|Accepted|Other>} 1
-
type
field would also record any custom types set by a controller. For example, kuadrant.io/AuthPolicyAffected
and kuadrant.io/RateLimitPolicyAffected
.
-This will allow the health of HTTPRoutes to be reported via metrics. A HTTPRoute that has an type
of Accepted and value of 1 means the HTTPRoute is accepted by the Gateway and can be considered healthy.
-It will also allow policy specific information about a HTTPRoute to be represented in metrics.
-For example, alerting on any HTTPRoutes that don't have the kuadrant.io/AuthPolicyAffected
type with a value of 1 i.e. HTTPRoutes without an AuthPolicy.
-Tests will be added directly to the project in a similar manner to the redis-exporter. -The test environment will bring up a kind cluster, create the Gateway API CRDs, example Gateway & HTTPRoute resources, then test the scrape endpoint. -This will be the same as how metrics are tested for the gateway-api-state-metrics project. -There is a separate test function for each resource.
-Existing example dashboards in the gateway-api-state-metrics project will be copied over to the exporter project and continue to work as before. -However, initially it will just be the Gateway, GatewayClass and HTTPRoute dashboards as those will be the metrics that are implemented first.
-The exporter will be written in golang and follow the guidelines from https://prometheus.io/docs/instrumenting/writing_exporters/. -Other exporters like the https://github.com/prometheus/node_exporter/tree/master and https://github.com/oliver006/redis_exporter will be referenced for patterns and library usage. -Metrics will only be pulled from the kubernetes API when Prometheus scrapes them. -That is, the exporter will not perform scrapes based on its own timers. -All scrapes will be synchronous.
-The client-go library will be used for all kubernetes API calls. -As the number of Gateways and HTTPRoutes could vary greatly, there is a performance consideration with these API calls if there are a lot of resources. -To allow for this, a single list of all resources of a kind will used rather than 1 by 1. -If in future there are issues with performance, there is an option to cache responses to expensive queries.
-A 'gateway_metrics_up` metric will be included, as per https://prometheus.io/docs/instrumenting/writing_exporters/#failed-scrapes -such that the exporter can continue to respond in a standard way if there are issues with some aspects of scraping. -The scrape response should include all metrics that have information available at that time of scraping.
-This is an additional library to maintain. -However, it will supersede the gateway-api-state-metrics project and dependency on kube-state-metrics.
-In theory it should be possible to get the desired functionality from the kube-state-metrics project if the proposed change in https://github.com/kubernetes/kube-state-metrics/pull/2059 is accepted and subsequently implemented. -However, that proposal has been open since May 2023. -I have ruled out the possibility of helping with the implementation of this change in that project due to:
-The proposed design is a more focused solution on the needs of the Kuadrant project from Gateway API resources in the form of metrics. -There are plenty of examples of exporters out there that we can reference and follow established patterns.
-If we don't make this change, we are limited to having just the overall Gateway status available via metrics, -and no HTTPRoute status information on which we can visualise and alert.
-The primary prior art are the kube-state-metrics project, and the gateway-api-state-metrics projects. -The gateway-api-state-metrics project uses the CustomResourceStateMetrics configuration feature of kube-state-metrics to configure what fields in which resources should be made available via metrics.
-n/a
-Although this exporter is intended to replace the gateway-api-state-metrics project, -it will likely take a phased approach to get to that point. -The initial goal is to get 'like for like' functionality from a Kuadrant project point of view (Gateways, GatewayClasses and HTTPRoutes), -followed by the new status functionality as detailed in this RFC. -Other resources, such as TLSRoute, UDPRoute etc.. can be added later.
- - - - - - - - - - - - - -policy_machinery
Explain how Kuadrant's Policy Machinery can be used for reconciliation.
-The Policy Machinery project (repo, pkg.go) offers a set of types and functions for implementing Gateway API policies – i.e.
-These can be used for tailoring implemention of Kuadrant policies and Kuadrant instances. See example provided.
-Leveraging the Policy Machinery can be key to:
-merge
strategy (RFC 0009)Although essentially an implementation detail of the Kuadrant Operator, levering the Policy Machinery may introduce the following user-perceived features:
-One who specifies Kuadrant policies targeting resources at any levels of the hierarchy Gateway → Listener → HTTPRoute → HTTPRouteRule1 shall expect the effect of such policies to be reported always respectively to the lowest level of the hierarchy that the kind of policy allows targeting. E.g.:
-sectionName
actually targets gateway listeners; the user wants to reason about the state of DNSPolicies regarding their effect on each listeners specified in the Gateway. In a context with 2+ DNSPolicies, simultaneously targeting both a Gateway and specific listeners, DNS records for some listener hostnames may have been reconciled according to the specification from one policy or another, occasionally no policy at all.In the specific case of policy kinds that allow targeting HTTPRouteRules, due to complex network topologies supported by Gateway API, including in particular HTTPRoutes with multiple Gateway parents, a same HTTPRouteRule may or may not be affected by a policy, depending on which routing path in the network topology a request flows. Therefore, ultimately users will reason about policies and effective policies in terms of the paths between at least gateways and the lowest levels targeted by the policies that a request can flow. Possibly, in terms of all possible paths between Gateways and Services.
-Kuadrant shall provide users with such visibility. Leveraging Policy Machinery is an implementation detail that nonetheless makes achieving this goal easier.
-Leveraging Policy Machinery may also motivate some user-facing changes. In particular, replacing AuthPolicy's and RateLimitPolicy's routeSelectors
for a targetRef
with optional sectionName
(made possible since kubernetes-sigs/gateway-api#2895.)
This change would cause a policy of AuthPolicy or RateLimitPolicy kind to always be attached to its targets entirely, i.e. without having rules that attach to some sections and other rules to other sections or no section at all. This differs from current situation where a policy of those kinds can be attached to a HTTPRoute and some of the policy's rules more specifically attached to individual HTTPRouteRules only, including with multiple policy rules attached to different HTTPRouteRules of the same targeted HTTPRoute. Instead, attaching a policy must be a cohesive, unambiguous operation, that occasionally requires users to specify more fine-grained policy objects to be attached only to sections of a resource.
-In some cases, splitting policy objects for the purpose of targeting sections of a network resource, without breaking the semantics of having a single set of policy objects cohesively defined, also implies that definitions about a same entity or concept within a policy (e.g. a limit definition), repeated at multiple policy objects, may have a way to represent to refer to the same thing (e.g. same set of counters).
-This is the case, for example, of limit definitions in a RateLimitPolicy as well as cache configs in an AuthPolicy. To avoid creating multiple rate-limit counter namespaces (analogously, multiple authorization rule cache entries) for definitions that are effectively about the same entity, despite specified at multiple policy objects, the APIs must provide a way for users to convey one of the other intent: definitions refer to the same thing versus definitions refer to different things.2
-The following possible (non-required) other user-facing changes can be enabled leveraging Policy Machinery, without marginal implementation cost:
-targetRefs
.Usage of the Policy Machinery consists of importing two packages:
-machinery
: provides the types and abstractions to build Gateway API topologies of targetable network resources, policies and adjacent objects;controller
: offers tools for implementing topology-based custom controllers of reconciliation logic.From that on, the following steps drive leveraging the Policy Machinery in the Kuadrant Operator:
-Implement the machinery.Policy
interface for all kinds of policies.
--Example provided for the DNSPolicy, TLSPolicy, AuthPolicy and RateLimitPolicy kinds.
-
Define wrappers that implement the machinery.Object
interface for any kind of adjacent object whose unique identifier as a node in the topology graph cannot be based on the default controller.RuntimeObject
type provided (if any.)
Implement the linking functions for all kinds of adjacent objects and corresponding parents and children in the topology graph, including types such as Kuadrant
, Istio's WasmPlugin
, ConfigMap
, etc.
--The
-Kuadrant
custom resources shall be the roots of a directed acyclic graph (DAG) from which an entired topology of targetable network resources, adjacent objects and policies are connected.
Start a controller.Controller
that:
controller.Workflow
on events related to any of the watched resources.At every reconciliation event3:
-Reconcile the internal objects for setting up the environment for a Kuadrant instance (deployments, gateway controller configs, etc).
-For each kind of policy and applicable path in the topology graph relevant for the policy kind:
-Compute an effective policy and give it a unique identifier.
-Update the status stanzas of all targetables in the topology whose paths were configured for an effective policy (or lack of such), with a map that allows users to inspect, for a given path, what effective policy (if any) will be enforced. - > Note: If unsuitable for the status stanza of the object, the details of the effective policies may require additional tooling to be inspected by the users and the mapping must be to the unique identifier of the effective policy.
-Store a DOT representation of the topology graph in a ConfigMap.
-Part of the work consists on refactoring, without value perceived by the user.
-Use of annnotations to track back references from targeted objects to policies. This approach has been slowly deprecated to favour the use of an in-memory directed acyclic graph (DAG) representing the relationship between network objects and policies6.
-Bottom-up reconciliation by default, focusing on the policy resources first. This approach has been slowly refactored to using mappers (event handlers) that often multiply a lower-level event into multiple top-down ones, occasionally with the occurrence of repetitive events.
-Preliminary version of the topology DAG6 that:
-Configuration of internal resources for implementing effective policy behavior:
-Envoy Gateway has implemented a custom controller for Gateway API and provider-specific resources with the following characteristics similar to the Policy Machinery controller
package's approach:
Reconcile
function that:goroutines
decoupled from controller-runtime.Gateway listeners and HTTPRouteRules can be targeted by specifying in a policy their main Gateway and HTTPRoute objects as targets, in combination with either a sectionName
(supported in the targetRef
field of the policy) or via routeSelectors
(in the policy spec proper), respectively. Not all kinds of policies support targeting all 4 kinds of targetables of the Gateway → Listener → HTTPRoute → HTTPRouteRule hierarchy; some kinds of policies may support targeting only a few of those. ↩
In the context of rate-limit, this problem is also referred to as the problem of the identity of a limit. ↩
-Specific steps can be filtered by type of event. ↩
-While the DNS operator as well as the configuration performed by the Kuadrant Operator for a TLSPolicy are closer to the enforcement of the specifications in the DNS and TLS policy objects, Authorino and Limitador are policy decision points (PDP) rather. Indistinctively, control-plane operations that configure a service based on the specification of a policy, as well as the data-plane protection services that perform at request-time along with the gateways are all part of the policy enforcement. ↩
-The attributes of a path in the topology from a Gateway to a HTTPRouteRule object typically include a hostname and the set of HTTPRouteMatches specified in the HTTPRouteRule. ↩
-See kuadrant-operator#530. ↩↩
-dns_policy_api_changes
)DNSPolicy is soon to go GA. There are several improvement we have identified over time that will enhance the GA of this API
-Remove the need for labels on Gateways for GEO and custom weighting values
-Reduce the verbosity of GEO and Weighting definitions
-Remove the need for a strategy to be chosen as part of the policy definition.
-We want to simplify and improve the DNSPolicy API and remove some of the legacy structures that have hung on since its original inception, as this involves some breaking changes we want these before we create a v1 API.
-Weighting and GEO attributes:
-The loadbalancing options we provide were first designed as part of an API that was intended to work with OCM (open cluster management). This provided multiple views of gateways across multiple clusters. So in order to understand the GEO context or individual weighing needed for a given cluster, we needed that context applying separately from the DNSPolicy spec that for legacy reasons targeted a "template" Gateway in the hub cluster.
-Now DNSPolicy is created on the same cluster as the actual Gateway and we do not use OCM or hub clusters, the need to label individual objects and Gateways with specific annotations and labels is now redundant and makes for a more complex and awkward API interaction.
-routingStrategy:
-We have also identified that the routingStrategy option in the DNSPolicy spec is redundant. When added we expected there to be more than two strategies. This has not emerged and so it is another awkward piece of the API that is not needed.
-You will no longer need to apply labels to Gateways in order to specify the GEO or Weighting for that Gateway. The policy targets a given Gateway and you will now just specify those values in the policy spec directly.
-You will no longer need to specify what routingStrategy you want to use. Instead you will either specify a loadbalancing section (meaning it is a loadbalanced strategy) or you will leave it empty (meaning it has no loadbalancing).
-Below is an example of what is currently needed to setup GEO and Custom Weighting with the existing API
-apiVersion: kuadrant.io/v1alpha1
-kind: DNSPolicy
-metadata:
- name: prod-web
- namespace: ingress-gateway
-spec:
- targetRef:
- name: prod-web
- group: gateway.networking.k8s.io
- kind: Gateway
- routingStrategy: loabalanced
- loadBalancing:
- weighted:
- defaultWeight: 120
- custom: # <--- New Custom Weights being added
-
- - weight: 255
- selector:
- matchLabels:
- kuadrant.io/lb-attribute-custom-weight: AWS # slects gateways to apply it to (when there can only be one)
- geo:
- defaultGeo: US #catch all geo
-
kuadrant.io/lb-attribute-custom-weight: AWS
label to the gateway that is already being targeted by the policy.
-To change the GEO for the targeted cluster, you need to apply a different label to the gateway: kuadrant.io/lb-attribute-geo-code: EU
for example.
On top of this you also have to specify that it is a load balanced DNSPolicy even though you have specified a load balancing section.
-This is an awkward and disconnected API that evolved from the legacy requirements called out above.
-Instead the new API to achieve the same goal will be:
-apiVersion: kuadrant.io/v1alpha1
-kind: DNSPolicy
-metadata:
- name: prod-web
- namespace: ingress-gateway
-spec:
- targetRef:
- name: prod-web
- group: gateway.networking.k8s.io
- kind: Gateway
- loadBalancing:
- weight: 100 #weight for listeners targeted
- geo: US # geo for listeners targeted
- defaultGEO: true # should this be consisdered the default GEO for the listener hosts
- providerRefs:
-
- - name: aws-credential-secret
-
So no longer do you need to specify whether the policy is a load balanced one or not via the redundant routingStrategy
field
Now you simplify specify the weight you want to use for the listeners in the gateway, the geo you want to use and whether it should be used as a default GEO or not (this is used by some cloud providers as a catch-all option if a user from a none specified GEO does a DNS lookup.). Each of the fields under "loadbalancing" will now be required.
-From an implementation perspective, all changes will happen in the Kuadarant Operator, where it will no longer look for the attribute labels on the gateways but instead will simply use the spec of the DNSPolicy. The resulting DNSRecord will not change in structure.
-To setup a simple DNS structure (single A or CNAME record), the API would now look like:
-apiVersion: kuadrant.io/v1alpha1
-kind: DNSPolicy
-metadata:
- name: prod-web
- namespace: ingress-gateway
-spec:
- targetRef:
- name: prod-web
- group: gateway.networking.k8s.io
- kind: Gateway
- providerRefs:
-
- - name: aws-credential-secret
-
Again no need for the redundant strategy field.
-Introduces a breaking change to the API.
-These are breaking changes and we are about to move to v1. Changes like these should land pre v1. These changes provide a much simpler and better user experience.
-NA
-NA
-NA
- - - - - - - - - - - - - -{"use strict";/*!
- * escape-html
- * Copyright(c) 2012-2013 TJ Holowaychuk
- * Copyright(c) 2015 Andreas Lubbe
- * Copyright(c) 2015 Tiancheng "Timothy" Gu
- * MIT Licensed
- */var Va=/["'&<>]/;qn.exports=za;function za(e){var t=""+e,r=Va.exec(t);if(!r)return t;var o,n="",i=0,s=0;for(i=r.index;i