From 8e474dee388ee2b1546098f2bf492e1560fccc48 Mon Sep 17 00:00:00 2001
From: Chad Scribner <chad@redhat.com>
Date: Tue, 31 May 2022 14:14:51 -0400
Subject: [PATCH] CoreDNS Cache Tuning Enhancement Proposal

---
 enhancements/dns/cache-tuning.md | 257 +++++++++++++++++++++++++++++++
 1 file changed, 257 insertions(+)
 create mode 100644 enhancements/dns/cache-tuning.md

diff --git a/enhancements/dns/cache-tuning.md b/enhancements/dns/cache-tuning.md
new file mode 100644
index 0000000000..e371648625
--- /dev/null
+++ b/enhancements/dns/cache-tuning.md
@@ -0,0 +1,257 @@
+---
+title: cache-tuning
+authors:
+  - "@brandisher"
+reviewers: # Include a comment about what domain expertise a reviewer is expected to bring and what area of the enhancement you expect them to focus on. For example: - "@networkguru, for networking aspects, please look at IP bootstrapping aspect"
+  - "@jerpeter1"
+  - "@Miciah"
+  - "@frobware"
+approvers:
+  - "@Miciah"
+api-approvers: # In case of new or modified APIs or API extensions (CRDs, aggregated apiservers, webhooks, finalizers). If there is no API change, use "None"
+  - "@soltysh"
+  - "@JoelSpeed"
+creation-date: 2022-06-30
+last-updated: 2022-06-30
+tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement
+  - https://issues.redhat.com/browse/NE-757
+see-also:
+  - N/A
+replaces:
+  - N/A
+superseded-by:
+  - N/A
+---
+
+# Cache Tuning
+
+## Summary
+
+The purpose of this enhancement is to provide an API in the cluster-dns-operator which will allow cluster 
+admins to configure durations for successful (positive) and unsuccessful (negative) caching of DNS query responses. 
+The scope of the configured durations are for all configured server blocks in CoreDNS' Corefile including the 
+default zone.
+
+## Motivation
+
+Customers running their own DNS resolvers et al want the option to configure OpenShift's DNS caching so that the 
+load on upstream resolvers can be reduced.
+
+### User Stories
+
+- As a cluster admin, I want the ability to configure a maximum time-to-live duration for successful DNS queries so 
+  that I can reduce the load on my DNS infrastructure.
+- As a cluster admin, I want the ability to configure a maximum time-to-live duration for unsuccessful DNS queries so 
+  that I can reduce the load on my DNS infrastructure.
+
+### Goals
+
+- Cluster admins should see a reduction in load for upstream resolvers.
+- Provide the ability to tune the positive/negative caching done by CoreDNS. In the end, have full caching based on 
+  TTLs provided by upstream DNS servers. This means not artificially reducing the TTL in Core DNS.
+- Core DNS does not override TTL (respect TTLs as provided by other DNS servers)
+
+### Non-Goals
+
+- This enhancement does not apply to the infrastructure DNS and is only intended to enable caching for the CoreDNS 
+  managed by the cluster-dns-operator.
+- This enhancement is not intended to enable configuration of additional caching parameters like `prefetch`, 
+  `serve_stale`, or configuring of cache capacity.
+
+## Proposal
+
+This enhancement proposal will add a new type of `DNSCache` which will contain two `metav1.Duration` fields; 
+`positiveTTL` and `negativeTTL`. `DNSSpec` will get a new field of `cache` which holds this configuration.
+
+
+### Workflow Description
+
+1. `oc edit dns.operator.openshift.io/default`
+2. Modify the caching values
+    ```yaml
+    spec:
+      cache:
+        successTTL: 1h
+        denialTTL: 0.5h10m
+    ```
+3. Save changes.
+
+#### Variation [optional]
+
+N/A
+
+### API Extensions
+
+```go
+// DNSCache defines the fields for configuring DNS caching.
+type DNSCache struct {
+	// positiveTTL is optional and specifies the amount of time that a positive response should be cached.
+	//
+	// If configured, it must be a value of 1s (1 second) or greater up to a theoretical maximum of several years.
+	// If not configured, the value will be 0 (zero) and OpenShift will use a default value of 900 seconds unless noted
+	// otherwise in the respective Corefile for your version of OpenShift. The default value of 900 seconds is subject
+	// to change. This field expects an unsigned duration string of decimal numbers, each with optional fraction and a
+	// unit suffix, e.g. "100s", "1m30s". Valid time units are "s", "m", and "h".
+	// +kubebuilder:validation:Pattern=^(0|([0-9]+(\.[0-9]+)?(s|m|h))+)$
+	// +kubebuilder:validation:Type:=string
+	// +optional
+	PositiveTTL metav1.Duration `json:"positiveTTL,omitempty"`
+
+	// negativeTTL is optional and specifies the amount of time that a negative response should be cached.
+	//
+	// If configured, it must be a value of 1s (1 second) or greater up to a theoretical maximum of several years.
+	// If not configured, the value will be 0 (zero) and OpenShift will use a default value of 30 seconds unless noted
+	// otherwise in the respective Corefile for your version of OpenShift. The default value of 30 seconds is subject
+	// to change. This field expects an unsigned duration string of decimal numbers, each with optional fraction and a
+	// unit suffix, e.g. "100s", "1m30s". Valid time units are "s", "m", and "h".
+	// +kubebuilder:validation:Pattern=^(0|([0-9]+(\.[0-9]+)?(s|m|h))+)$
+	// +kubebuilder:validation:Type:=string
+	// +optional
+	NegativeTTL metav1.Duration `json:"negativeTTL,omitempty"`
+}
+```
+
+```go
+// DNSSpec is the specification of the desired behavior of the DNS.
+type DNSSpec struct {
+	// <snip>
+
+	// cache describes the caching configuration that applies to all server blocks listed in the Corefile.
+	// This field allows a cluster admin to optionally configure:
+	// * positiveTTL which is a duration for which positive responses should be cached.
+	// * negativeTTL which is a duration for which negative responses should be cached.
+	// If this is not configured, OpenShift will configure positive and negative caching with a default value that is
+	// subject to change. At the time of writing, the default positiveTTL is 900 seconds and the default negativeTTL is
+	// 30 seconds or as noted in the respective Corefile for your version of OpenShift.
+	// +optional
+	Cache DNSCache `json:"cache,omitempty"`
+}
+```
+
+### Implementation Details/Notes/Constraints [optional]
+
+Changes will be needed in the following repositories:
+- openshift/api [PR](https://github.com/openshift/api/pull/1214)
+- openshift/cluster-dns-operator [PR](https://github.com/openshift/cluster-dns-operator/pull/335)
+
+#### Notes
+
+**If someone decides to add an API to configure minTTL in CoreDNS**
+
+The `cache` plugin will cache both positive and negative responses handled by the `kubernetes` plugin. Coincidentally, 
+the current CoreDNS default minimum TTL for the `cache` plugin is `5s` which is the same as the default `ttl` for the 
+`kubernetes` plugin. This allows `cluster.local` lookups to be cached for the expected amount of time. If we choose 
+to allow changing the `minTTL`  in the `cache` plugin in the future we'll need to assess how this will affect critical
+cluster services. Here are some examples of where setting a `minTTL` could affect services/pods.
+
+* If you set a positive `minTTL` to `22s`, successful `cluster.local` queries will be cached for `22s` instead of `5s`.
+This would make services/pods appear to be available even though they are not available.
+* If you set a negative `minTTL` to `7s`, failed `cluster.local` queries will be caches for `7s` instead of `5s`. This
+would make services/pods appear to be unavailable even though they are available.
+
+The net effect is that adjusting `minTTL` for either the positive or negative case could result in traffic being sent to
+services/pods that no longer exist or exist but in the negative response cache.
+
+### Risks and Mitigations
+
+* **Risk 1**: If we move away from CoreDNS in the future, the API could be impacted
+  * **Mitigation:** Ensure the API is flexible enough to handle reasonable divergence from what CoreDNS offers in the 
+`cache` plugin.
+* **Risk 2**: Setting TTLs to low values could lead to increased load on upstream resolvers and cause DoS.
+  * **Mitigation**: Cluster admins can revert the changes to restore normal function.
+
+### Drawbacks
+
+No notable drawbacks aside from a slight increase in support complexity.
+
+## Design Details
+
+### Open Questions [optional]
+
+Q. Will CoreDNS honor high TTL values such as 604800 (= 1 week)?
+A. Yes. CoreDNS uses `time.Duration` for TTLs which enables using extremely long durations. We'll be using `metav1.
+Duration` for better user experience and converting the string unit (e.g. `1h`) to the respective number of seconds 
+for usage in the Corefile. We do this because the CoreDNS Corefile uses seconds as the unit for TTL values but it would
+be cumbersome for a user to need manually calculate the number of seconds for longer or unique timeframes.
+
+### Test Plan
+
+#### Testing TTL settings
+1. Deploy an upstream resolver.
+2. Modify the cache settings to have a low `successTTL` (e.g. `1m`).
+3. Issue a known successful DNS query for the associated zone once to ensure the record is cached.
+4. Issue the query a second time to ensure that it's served from the cache.
+5. Wait until the TTL period has expired and then issue the query again to ensure that it's not served from the cache.
+
+#### Testing TTL Passthrough
+1. Deploy an upstream resolver that can respond with a specific TTL for a zone. Ensure that this TTL is higher than 
+   the default for CoreDNS.
+2. Issue a known successful query for the associated zone.
+3. Wait until the cluster DNS TTL has expired and then issue the query again to ensure that it's survived the 
+   cluster DNS cache settings.
+
+### Graduation Criteria
+
+#### Dev Preview -> Tech Preview
+
+This will go directly to GA.
+
+#### Tech Preview -> GA
+
+This will go directly to GA. It will include product documentation changes and e2e tests.
+
+#### Removing a deprecated feature
+
+N/A
+
+### Upgrade / Downgrade Strategy
+
+Upgrade expectations:
+- When upgrading to versions where the cache API _is not_ available, the current default in the CoreDNS Corefile will 
+  apply normally.
+- When upgrading to versions where the cache API _is_ available, no interruption will occur as the CoreDNS Corefile 
+  cache settings will apply by default even if the API is not configured.
+- When upgrading from a version with the cache API to another version with the cache API, the configuration will be 
+  retained across upgrades.
+
+Downgrade expectations:
+- When downgrading to versions where the cache API _is not_ available, the cluster admin will remove the caching 
+  settings when downgrading.
+- When downgrading to versions where the cache API _is_ available, the configuration settings should be retained 
+  across downgrades.
+
+### Version Skew Strategy
+
+N/A
+
+### Operational Aspects of API Extensions
+
+Since allowing for adjustment of the CoreDNS cache settings is primarily a judgement call based on the environment 
+the cluster is running in, we do not expect a change in operator conditions as a result of configuring caching. 
+Modifying cache settings could have an effect on the collected CoreDNS metrics and as a result the alerts associated 
+with those metrics, but they would be localized to a particular environment and would need to be triaged accordingly.
+
+#### Failure Modes
+
+* TTLs set too low
+  * This could cause cache churn and put excessive load on CoreDNS.
+* TTLs set too high
+  * This could cause stale data to be served back to clients.
+
+#### Support Procedures
+
+Apply standard DNS troubleshooting methods to support this feature. See [Failure Modes](#failure-modes) for potential
+issues customers could run into.
+
+## Implementation History
+
+This enhancement is being implemented in OpenShift 4.12.
+
+## Alternatives
+
+One potential alternative would be to do this in the infrastructure CoreDNS but that could run the risk of 
+overloading a core functionality of the cluster and could make failure modes more impactful.
+
+## Infrastructure Needed [optional]
+
+N/A