From 0cb27ed21258acad97fab34f7019c9bcc819ce75 Mon Sep 17 00:00:00 2001 From: Steven Smith Date: Mon, 10 Oct 2022 09:32:34 -0400 Subject: [PATCH] Adds procedure and work for adding a hint file to the node IP configuration --- .../ipi-install-installation-workflow.adoc | 1 + modules/nw-how-nw-iface-selected.adoc | 61 +----------- ...iding-default-node-ip-selection-logic.adoc | 97 +++++++++++++++++++ .../troubleshooting-network-issues.adoc | 2 + 4 files changed, 105 insertions(+), 56 deletions(-) create mode 100644 modules/overriding-default-node-ip-selection-logic.adoc diff --git a/installing/installing_bare_metal_ipi/ipi-install-installation-workflow.adoc b/installing/installing_bare_metal_ipi/ipi-install-installation-workflow.adoc index 84aa1653c592..2e5159fa6b6a 100644 --- a/installing/installing_bare_metal_ipi/ipi-install-installation-workflow.adoc +++ b/installing/installing_bare_metal_ipi/ipi-install-installation-workflow.adoc @@ -77,6 +77,7 @@ include::modules/ipi-install-configuring-the-raid.adoc[leveloffset=+2] include::modules/ipi-install-creating-a-disconnected-registry.adoc[leveloffset=+1] + [discrete] [id="prerequisites_ipi-disconnected-registry"] === Prerequisites diff --git a/modules/nw-how-nw-iface-selected.adoc b/modules/nw-how-nw-iface-selected.adoc index df8ba63b5f95..ed7ace2b87c2 100644 --- a/modules/nw-how-nw-iface-selected.adoc +++ b/modules/nw-how-nw-iface-selected.adoc @@ -8,66 +8,15 @@ endif::[] [id="nw-how-nw-iface-selected_{context}"] = How the network interface is selected -For installations on bare metal or with virtual machines that have more than one network interface controller (NIC), the NIC that {product-title} uses for communication with the Kubernetes API server is determined by the `nodeip-configuration.service` service unit that is run by systemd when the node boots. -The service iterates through the network interfaces on the node and the first network interface that is configured with a subnet than can host the IP address for the API server is selected for {product-title} communication. +For installations on bare metal or with virtual machines that have more than one network interface controller (NIC), the NIC that {product-title} uses for communication with the Kubernetes API server is determined by the `nodeip-configuration.service` service unit that is run by systemd when the node boots. The `nodeip-configuration.service` selects the IP from the interface associated with the default route. -After the `nodeip-configuration.service` service determines the correct NIC, the service creates the `/etc/systemd/system/kubelet.service.d/20-nodenet.conf` file. -The `20-nodenet.conf` file sets the `KUBELET_NODE_IP` environment variable to the IP address that the service selected. +After the `nodeip-configuration.service` service determines the correct NIC, the service creates the `/etc/systemd/system/kubelet.service.d/20-nodenet.conf` file. The `20-nodenet.conf` file sets the `KUBELET_NODE_IP` environment variable to the IP address that the service selected. -When the kubelet service starts, it reads the value of the environment variable from the `20-nodenet.conf` file and sets the IP address as the value to the `--node-ip` kubelet command-line argument. -As a result, the kubelet service uses the selected IP address as the node IP address. +When the kubelet service starts, it reads the value of the environment variable from the `20-nodenet.conf` file and sets the IP address as the value of the `--node-ip` kubelet command-line argument. As a result, the kubelet service uses the selected IP address as the node IP address. -If hardware or networking is reconfigured after installation, it is possible that the `nodeip-configuration.service` service can select a different NIC after a reboot. -In some cases, you might be able to detect that a different NIC is selected by reviewing the `INTERNAL-IP` column in the output from the `oc get nodes -o wide` command. +If hardware or networking is reconfigured after installation, or if there is a networking layout where the node IP should not come from the default route interface, it is possible for the `nodeip-configuration.service` service to select a different NIC after a reboot. In some cases, you might be able to detect that a different NIC is selected by reviewing the `INTERNAL-IP` column in the output from the `oc get nodes -o wide` command. -If network communication is disrupted or misconfigured because a different NIC is selected, one strategy for overriding the selection process is to set the correct IP address explicitly. -The following list identifies the high-level steps and considerations: - -* Create a shell script that determines the IP address to use for {product-title} communication. Have the script create a custom unit file such as `/etc/systemd/system/kubelet.service.d/98-nodenet-override.conf`. Use the custom unit file, `98-nodenet-override.conf`, to set the `KUBELET_NODE_IP` environment variable to the IP address. - -* Do not overwrite the `/etc/systemd/system/kubelet.service.d/20-nodenet.conf` file. Specify a file name with a numerically higher value such as `98-nodenet-override.conf` in the same directory path. The goal is to have the custom unit file run after `20-nodenet.conf` and override the value of the environment variable. - -* Create a machine config object with the shell script as a base64-encoded string and use the Machine Config Operator to deploy the script to the nodes at a file system path such as `/usr/local/bin/override-node-ip.sh`. - -* Ensure that `systemctl daemon-reload` runs after the shell script runs. The simplest method is to specify `ExecStart=systemctl daemon-reload` in the machine config, as shown in the following sample. - -.Sample machine config to override the network interface for kubelet -[source,yaml,subs="attributes+"] ----- -apiVersion: machineconfiguration.openshift.io/v1 -kind: MachineConfig -metadata: - labels: - machineconfiguration.openshift.io/role: worker - name: 98-nodenet-override -spec: - config: - ignition: - version: {ign-config-version} - storage: - files: - - contents: - source: data:text/plain;charset=utf-8;base64, - mode: 0755 - overwrite: true - path: /usr/local/bin/override-node-ip.sh - systemd: - units: - - contents: | - [Unit] - Description=Override node IP detection - Wants=network-online.target - Before=kubelet.service - After=network-online.target - [Service] - Type=oneshot - ExecStart=/usr/local/bin/override-node-ip.sh - ExecStart=systemctl daemon-reload - [Install] - WantedBy=multi-user.target - enabled: true - name: nodenet-override.service ----- +If network communication is disrupted or misconfigured because a different NIC is selected, you might receive the following error: `EtcdCertSignerControllerDegraded`. You can create a hint file that includes the `NODEIP_HINT` variable to override the default IP selection logic. For more information, see Optional: Overriding the default node IP selection logic. // Link to info for creating a machine config. diff --git a/modules/overriding-default-node-ip-selection-logic.adoc b/modules/overriding-default-node-ip-selection-logic.adoc new file mode 100644 index 000000000000..5b689555c8f1 --- /dev/null +++ b/modules/overriding-default-node-ip-selection-logic.adoc @@ -0,0 +1,97 @@ +// This is included in the following assemblies: +// +// * troubleshooting-network-issues.adoc + +:_content-type: PROCEDURE +[id="overriding-default-node-ip-selection-logic_{context}"] += Optional: Overriding the default node IP selection logic + +To override the default IP selection logic, you can create a hint file that includes the `NODEIP_HINT` variable to override the default IP selection logic. Creating a hint file allows you to select a specific node IP address from the interface in the subnet of the IP address specified in the `NODEIP_HINT` variable. + +For example, if a node has two interfaces, `eth0` with an address of `10.0.0.10/24`, and `eth1` with an address of `192.0.2.5/24`, and the default route points to `eth0` (`10.0.0.10`),the node IP address would normally use the `10.0.0.10` IP address. + +Users can configure the `NODEIP_HINT` variable to point at a known IP in the subnet, for example, a subnet gateway such as `192.0.2.1` so that the other subnet, `192.0.2.0/24`, is selected. As a result, the `192.0.2.5` IP address on `eth1` is used for the node. + +The following procedure shows how to override the default node IP selection logic. + +.Procedure + +. Add a hint file to your your `/etc/default/nodeip-configuration` file, for example: ++ +[source,text] +---- +NODEIP_HINT=192.0.2.1 +---- ++ +[IMPORTANT] +==== +* Do not use the exact IP address of a node as a hint, for example, `192.0.2.5`. Using the exact IP address of a node causes the node using the hint IP address to fail to configure correctly. +* The IP address in the hint file is only used to determine the correct subnet. It will not receive traffic as a result of appearing in the hint file. +==== + +. Generate the `base-64` encoded content by running the following command: ++ +[source,terminal] +---- +$ echo 'NODEIP_HINT=192.0.2.1' | base64 +---- ++ +.Example output ++ +[source,terminal] +---- +Tk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx== +---- + +. Activate the hint by creating a machine config manifest for both `master` and `worker` roles before deploying the cluster: ++ +.Master machine config manifest +[source,yaml] +---- +apiVersion: machineconfiguration.openshift.io/v1 +kind: MachineConfig +metadata: + labels: + machineconfiguration.openshift.io/role: master + name: 99-nodeip-hint-master +spec: + config: + ignition: + version: 3.2.0 + storage: + files: + - contents: + source: data:text/plain;charset=utf-8;base64, <1> + mode: 0644 + overwrite: true + path: /etc/default/nodeip-configuration +---- ++ +<1> Replace `` with the base64-encoded content of the `/etc/default/nodeip-configuration` file, for example, `Tk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx==`. ++ +.Worker machine config manifest +[source,yaml] +---- +apiVersion: machineconfiguration.openshift.io/v1 +kind: MachineConfig +metadata: + labels: + machineconfiguration.openshift.io/role: worker + name: 99-nodeip-hint-worker +spec: + config: + ignition: + version: 3.2.0 + storage: + files: + - contents: + source: data:text/plain;charset=utf-8;base64, <1> + mode: 0644 + overwrite: true + path: /etc/default/nodeip-configuration +---- +<1> Replace `` with the base64-encoded content of the `/etc/default/nodeip-configuration` file, for example, `Tk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx==`. + +. Save the manifest to the directory where you store your cluster configuration, for example, `~/clusterconfigs`. + +. Deploy the cluster. \ No newline at end of file diff --git a/support/troubleshooting/troubleshooting-network-issues.adoc b/support/troubleshooting/troubleshooting-network-issues.adoc index 42acdef0abb8..7adaa620451e 100644 --- a/support/troubleshooting/troubleshooting-network-issues.adoc +++ b/support/troubleshooting/troubleshooting-network-issues.adoc @@ -9,6 +9,8 @@ toc::[] // How the network interface is selected include::modules/nw-how-nw-iface-selected.adoc[leveloffset=+1] +include::modules/overriding-default-node-ip-selection-logic.adoc[leveloffset=+2] + // Troubleshooting OVS issues include::modules/nw-troubleshoot-ovs.adoc[leveloffset=+1]