Skip to content

Commit

Permalink
feat: add node local dns cache (#511)
Browse files Browse the repository at this point in the history
#### Motivation

We are seeing a large amount of DNS failures with timeouts and
`EAI_AGAIN` responses, we believe it might be due to the DNS servers
getting overwhelmed when the cluster is scalling rapidly up.

By adding a DNS cache to every node there is a huge reduction of DNS
requests made over the wire to the primary coredns servers.


#### Modification

Adds a DNS cache to every node inside the cluster.

---------

Co-authored-by: paulfouquet <[email protected]>
  • Loading branch information
blacha and paulfouquet authored Apr 4, 2024
1 parent 2d97fe4 commit 0877929
Show file tree
Hide file tree
Showing 7 changed files with 315 additions and 23 deletions.
22 changes: 17 additions & 5 deletions docs/dns.configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,16 @@ Start a shell on the container
k exec -n :namespace -it :podName -- /bin/bash
```

Install basic dns utils `dig` `ping` `wget` and `curl`
Install basic networking utils `dig`, `ping`, `ping6`, `wget`, `nslookup`, and `curl`

```bash
apt install dnsutils iptools-ping wget curl
apt update && apt install -y dnsutils iputils-ping wget curl
```

Other useful tools may include `tracepath`, `traceroute` and `mtr`

```bash
apt update && apt install -y iputils-tracepath mtr traceroute
```

### Name resolution
Expand Down Expand Up @@ -69,18 +75,24 @@ Depending on the container you may have access to scripting languages.

#### NodeJS

file: index.mjs
create a new file `index.mjs`

```javascript
fetch('https://google.com').then((c) => console.log(c));

import * as dns from 'dns/promises';

await dns.resolve('google.com', 'A');
await dns.resolve('google.com', 'AAAA');
console.log(await dns.resolve('google.com', 'A'));
console.log(await dns.resolve('google.com', 'AAAA'));
```

Run the file

```bash
node --version
node index.mjs
```

## Node Local DNS

A local DNS cache is running on every node, [node-local-dns](./infrastructure/components/node.local.dns.md) if any DNS issues occur it is recommended to turn the DNS cache off as a first step for debugging
42 changes: 42 additions & 0 deletions docs/infrastructure/components/node.local.dns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Node Local DNS

When large [argo](./argo.workflows.md) jobs are submitted the kubernetes cluster can sometimes scale up very quickly which can overwhelm the coredns DNS resolvers that are running on the primary nodes.

To prevent the overload a DNS cache is installed on every new node when it starts.

It is based off https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/ and it greatly reduces the load on the primary DNS servers.

## Debugging DNS problems with node local DNS

If DNS problems occur while node local dns is running, it is recommended to turn it off using the `UseNodeLocalDns = false` constant in `infra/constants.ts`

## Watching DNS requests

By default the DNS cache will log any external DNS requests it is resolving (anything that is not ending with `.cluster.local`) since there can be a large number of dns cache pods the following command will tail the logs from

```
kubectl logs -n kube-system --all-containers=true -f daemonset/node-local-dns --since=1m --timestamps=true --prefix=true
```

### Structured logs

`coredns` does not provide a simple way of constructing a structured log from the DNS request, it does provide a template system which can be used to craft a JSON log line, if the log line is in structured format like JSON it can be more easily processed into something like elasticsearch for additional debugging.

For the current log format see `CoreFileJsonLogFormat` and below is a example log request

```json
{
"remoteIp": "[2406:da1c:afb:bc0b:d0e3::6]",
"remotePort": 43621,
"protocol": "udp",
"queryId": "14962",
"queryType": "A",
"queryClass": "IN",
"queryName": "logs.ap-southeast-2.amazonaws.com.",
"querySize": 51,
"dnsSecOk": "false",
"responseCode": "NOERROR",
"responseFlags": "qr,rd,ra",
"responseSize": 443
}
```
45 changes: 29 additions & 16 deletions infra/cdk8s.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,32 +7,45 @@ import { EventExporter } from './charts/event.exporter.js';
import { FluentBit } from './charts/fluentbit.js';
import { Karpenter, KarpenterProvisioner } from './charts/karpenter.js';
import { CoreDns } from './charts/kube-system.coredns.js';
import { CfnOutputKeys, ClusterName, ScratchBucketName, validateKeys } from './constants.js';
import { getCfnOutputs } from './util/cloud.formation.js';
import { NodeLocalDns } from './charts/kube-system.node.local.dns.js';
import { CfnOutputKeys, ClusterName, ScratchBucketName, UseNodeLocalDns, validateKeys } from './constants.js';
import { describeCluster, getCfnOutputs } from './util/cloud.formation.js';
import { fetchSsmParameters } from './util/ssm.js';

const app = new App();

async function main(): Promise<void> {
// Get cloudformation outputs
const cfnOutputs = await getCfnOutputs(ClusterName);
const [cfnOutputs, ssmConfig, clusterConfig] = await Promise.all([
getCfnOutputs(ClusterName),
fetchSsmParameters({
// Config for Cloudflared to access argo-server
tunnelId: '/eks/cloudflared/argo/tunnelId',
tunnelSecret: '/eks/cloudflared/argo/tunnelSecret',
tunnelName: '/eks/cloudflared/argo/tunnelName',
accountId: '/eks/cloudflared/argo/accountId',

// Personal access token to gain access to linz-li-bot github user
githubPat: '/eks/github/linz-li-bot/pat',

// Argo Database connection password
argoDbPassword: '/eks/argo/postgres/password',
}),
describeCluster(ClusterName),
]);
validateKeys(cfnOutputs);

const ssmConfig = await fetchSsmParameters({
// Config for Cloudflared to access argo-server
tunnelId: '/eks/cloudflared/argo/tunnelId',
tunnelSecret: '/eks/cloudflared/argo/tunnelSecret',
tunnelName: '/eks/cloudflared/argo/tunnelName',
accountId: '/eks/cloudflared/argo/accountId',

// Personal access token to gain access to linz-li-bot github user
githubPat: '/eks/github/linz-li-bot/pat',
const coredns = new CoreDns(app, 'dns', {});

// Argo Database connection password
argoDbPassword: '/eks/argo/postgres/password',
});
// Node localDNS is very expermential in this cluster, it can and will break DNS resolution
// If there are any issues with DNS, NodeLocalDNS should be disabled first.
if (UseNodeLocalDns) {
const ipv6Cidr = clusterConfig.kubernetesNetworkConfig?.serviceIpv6Cidr;
if (ipv6Cidr == null) throw new Error('Unable to use node-local-dns without ipv6Cidr');
const nodeLocal = new NodeLocalDns(app, 'node-local-dns', { serviceIpv6Cidr: ipv6Cidr });
nodeLocal.addDependency(coredns);
}

const coredns = new CoreDns(app, 'dns', {});
const fluentbit = new FluentBit(app, 'fluentbit', {
saName: cfnOutputs[CfnOutputKeys.FluentBitServiceAccountName],
clusterName: ClusterName,
Expand Down
7 changes: 5 additions & 2 deletions infra/charts/kube-system.coredns.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ import { Construct } from 'constructs';

import { applyDefaultLabels } from '../util/labels.js';

/** Configure CoreDNS to output a JSON object for its log files */
export const CoreFileJsonLogFormat = `{"remoteIp":"{remote}","remotePort":{port},"protocol":"{proto}","queryId":"{>id}","queryType":"{type}","queryClass":"{class}","queryName":"{name}","querySize":{size},"dnsSecOk":"{>do}","responseCode":"{rcode}","responseFlags":"{>rflags}","responseSize":{rsize}}`;

/**
* This cluster is setup as dual ipv4/ipv6 where ipv4 is used for external traffic
* and ipv6 for internal traffic.
Expand Down Expand Up @@ -36,7 +39,7 @@ export class CoreDns extends Chart {
// FIXME: is there a better way of handling config files inside of cdk8s
Corefile: `
cluster.local:53 {
log
log . ${CoreFileJsonLogFormat}
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
Expand All @@ -53,7 +56,7 @@ cluster.local:53 {
}
.:53 {
log
log . ${CoreFileJsonLogFormat}
errors
health
template ANY AAAA {
Expand Down
207 changes: 207 additions & 0 deletions infra/charts/kube-system.node.local.dns.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
import { ApiObject, Chart, ChartProps, JsonPatch, Size } from 'cdk8s';
import * as kplus from 'cdk8s-plus-27';
import { Construct } from 'constructs';

import { applyDefaultLabels } from '../util/labels.js';
import { CoreFileJsonLogFormat } from './kube-system.coredns.js';

export interface NodeLocalDnsProps extends ChartProps {
/** cluster networking configuration */
serviceIpv6Cidr: string;

/**
* bind the node-local-dns to a top level suffix on the {@link serviceIpv6Cidr}
*
* @defaultValue "ffaa"
*/
bindAddressSuffix?: string;
}

export class NodeLocalDns extends Chart {
constructor(scope: Construct, id: string, props: NodeLocalDnsProps) {
super(scope, id, {
...applyDefaultLabels(props, 'node-local-dns', 'v1', 'kube-dns', 'kube-dns'),
namespace: 'kube-system',
});

const bindAddressSuffix = props.bindAddressSuffix ?? 'ffaa';

const serviceBaseAddress = props.serviceIpv6Cidr.slice(0, props.serviceIpv6Cidr.lastIndexOf('/'));

const bindAddress = serviceBaseAddress + bindAddressSuffix;
const upstreamDns = serviceBaseAddress + 'a'; // TODO is this always `::a` ?

const dnsUpstream = new kplus.Service(this, 'kube-dns-upstream', {
metadata: {
name: 'kube-dns-upstream',
labels: {
'kubernetes.io/cluster-service': 'true',
'kubernetes.io/name': 'KubeDNSUpstream',
'k8s-app': 'kube-dns',
},
},
ports: [
{ name: 'dns', port: 53, protocol: kplus.Protocol.UDP, targetPort: 53 },
{ name: 'dns-tcp', port: 53, protocol: kplus.Protocol.TCP, targetPort: 53 },
],
selector: kplus.Pods.select(this, 'kube-dns-upstream-pods', { labels: { 'k8s-app': 'kube-dns' } }),
});

const configMap = new kplus.ConfigMap(this, 'node-local-dns-config', {
metadata: { name: 'node-local-dns' },
data: {
Corefile: generateCorefile({ bindAddress: bindAddress, upstreamAddress: upstreamDns }),
},
});

const ds = new kplus.DaemonSet(this, 'node-local-dns-daemon', {
metadata: {
name: 'node-local-dns',
labels: { 'kubernetes.io/cluster-service': 'true' },
},

serviceAccount: new kplus.ServiceAccount(this, 'node-local-dns-sa', {
metadata: {
name: 'node-local-dns',
labels: { 'kubernetes.io/cluster-service': 'true' },
},
}),
securityContext: { ensureNonRoot: false },
dns: { policy: kplus.DnsPolicy.DEFAULT },
hostNetwork: true,
podMetadata: {},
containers: [
{
name: 'node-cache',
securityContext: {
ensureNonRoot: false,
allowPrivilegeEscalation: true,
readOnlyRootFilesystem: false,
privileged: true,
},
image: 'registry.k8s.io/dns/k8s-dns-node-cache:1.22.28',
resources: { cpu: { request: kplus.Cpu.millis(25) }, memory: { request: Size.mebibytes(5) } }, // { cpu: 25m, 5Mi} ContainerResources
args: [
'-localip',
[bindAddress, upstreamDns].join(','),
'-conf',
'/etc/Corefile',
'-upstreamsvc',
dnsUpstream.name,
],
ports: [
{ name: 'dns', protocol: kplus.Protocol.UDP, number: 53 },
// { name: 'dns-tcp', protocol: kplus.Protocol.TCP, number: 53 }, // TODO this is broken, see JSONPatch
{ name: 'metrics', protocol: kplus.Protocol.TCP, number: 9253 },
],
liveness: kplus.Probe.fromHttpGet('/health', { port: 8080 /* TODO: host: bindAddress, see JSONPatch*/ }),
volumeMounts: [
{
path: '/etc/coredns',
volume: kplus.Volume.fromConfigMap(this, 'config-volume', configMap, {
name: 'node-local-dns',
items: { Corefile: { path: 'Corefile.base' } },
}),
},
{
path: '/run/xtables.lock',
readOnly: false,
volume: kplus.Volume.fromHostPath(this, 'iptables', 'xtables-lock', {
path: '/run/xtables.lock',
type: kplus.HostPathVolumeType.FILE_OR_CREATE,
}),
},
],
},
],
});

// ESCAPE hatches to manually overwrite a few configuration options that could not be configured with cdk8s
const apiDs = ApiObject.of(ds);
apiDs.addJsonPatch(
// httpGet.host is missing from kplus.Probe.fromHttpGet
JsonPatch.add('/spec/template/spec/containers/0/livenessProbe/httpGet/host', bindAddress),
// TODO where is this defined
JsonPatch.add('/spec/template/spec/priorityClassName', 'system-node-critical'),
// unable to add two ports with the same number even though they are different protocols
JsonPatch.add('/spec/template/spec/containers/0/ports/1', {
containerPort: 53,
name: 'dns-tcp',
protocol: 'TCP',
}),
// Unable to set the security context
JsonPatch.replace('/spec/template/spec/containers/0/securityContext', { capabilities: { add: ['NET_ADMIN'] } }),

// Force the tolerations on to the pod
JsonPatch.add('/spec/template/spec/tolerations', [
{ key: 'CriticalAddonsOnly', operator: 'Exists' },
{ effect: 'NoExecute', operator: 'Exists' },
{ effect: 'NoSchedule', operator: 'Exists' },
]),
);
}
}

/**
* Generate a node-local-dns cache Corefile
*
* Taken from https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml#L56
* @param ctx addresses to replace
* @returns coredns's CoreFile configuration for node local dns
*/
function generateCorefile(ctx: { bindAddress: string; upstreamAddress: string }): string {
// __PILLAR___ keys are replaced automatically by the node local dns pod
return `cluster.local:53 {
errors
cache {
success 9984 30
denial 9984 5
}
reload
loop
bind ${ctx.bindAddress} ${ctx.upstreamAddress}
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
health [${ctx.bindAddress}]:8080
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind ${ctx.bindAddress} ${ctx.upstreamAddress}
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
}
ip6.arpa:53 {
errors
cache 30
reload
loop
bind ${ctx.bindAddress} ${ctx.upstreamAddress}
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
}
.:53 {
log . ${CoreFileJsonLogFormat}
errors
cache 30
reload
loop
template ANY AAAA {
rcode NOERROR
}
bind ${ctx.bindAddress} ${ctx.upstreamAddress}
forward . __PILLAR__UPSTREAM__SERVERS__
prometheus :9253
}`;
}
Loading

0 comments on commit 0877929

Please sign in to comment.