Skip to content

Commit

Permalink
Merge pull request #54015 from openshift-cherrypick-robot/cherry-pick…
Browse files Browse the repository at this point in the history
…-53986-to-enterprise-4.12

[enterprise-4.12] Adds procedure and work for adding a hint file to the node IP configu…
  • Loading branch information
stevsmit authored Dec 19, 2022
2 parents 14f0e5f + 4abde4c commit 5c0ca0e
Show file tree
Hide file tree
Showing 4 changed files with 105 additions and 56 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ include::modules/ipi-install-configuring-the-raid.adoc[leveloffset=+2]

include::modules/ipi-install-creating-a-disconnected-registry.adoc[leveloffset=+1]


[discrete]
[id="prerequisites_ipi-disconnected-registry"]
=== Prerequisites
Expand Down
61 changes: 5 additions & 56 deletions modules/nw-how-nw-iface-selected.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,66 +8,15 @@ endif::[]
[id="nw-how-nw-iface-selected_{context}"]
= How the network interface is selected

For installations on bare metal or with virtual machines that have more than one network interface controller (NIC), the NIC that {product-title} uses for communication with the Kubernetes API server is determined by the `nodeip-configuration.service` service unit that is run by systemd when the node boots.
The service iterates through the network interfaces on the node and the first network interface that is configured with a subnet than can host the IP address for the API server is selected for {product-title} communication.
For installations on bare metal or with virtual machines that have more than one network interface controller (NIC), the NIC that {product-title} uses for communication with the Kubernetes API server is determined by the `nodeip-configuration.service` service unit that is run by systemd when the node boots. The `nodeip-configuration.service` selects the IP from the interface associated with the default route.

After the `nodeip-configuration.service` service determines the correct NIC, the service creates the `/etc/systemd/system/kubelet.service.d/20-nodenet.conf` file.
The `20-nodenet.conf` file sets the `KUBELET_NODE_IP` environment variable to the IP address that the service selected.
After the `nodeip-configuration.service` service determines the correct NIC, the service creates the `/etc/systemd/system/kubelet.service.d/20-nodenet.conf` file. The `20-nodenet.conf` file sets the `KUBELET_NODE_IP` environment variable to the IP address that the service selected.

When the kubelet service starts, it reads the value of the environment variable from the `20-nodenet.conf` file and sets the IP address as the value to the `--node-ip` kubelet command-line argument.
As a result, the kubelet service uses the selected IP address as the node IP address.
When the kubelet service starts, it reads the value of the environment variable from the `20-nodenet.conf` file and sets the IP address as the value of the `--node-ip` kubelet command-line argument. As a result, the kubelet service uses the selected IP address as the node IP address.

If hardware or networking is reconfigured after installation, it is possible that the `nodeip-configuration.service` service can select a different NIC after a reboot.
In some cases, you might be able to detect that a different NIC is selected by reviewing the `INTERNAL-IP` column in the output from the `oc get nodes -o wide` command.
If hardware or networking is reconfigured after installation, or if there is a networking layout where the node IP should not come from the default route interface, it is possible for the `nodeip-configuration.service` service to select a different NIC after a reboot. In some cases, you might be able to detect that a different NIC is selected by reviewing the `INTERNAL-IP` column in the output from the `oc get nodes -o wide` command.

If network communication is disrupted or misconfigured because a different NIC is selected, one strategy for overriding the selection process is to set the correct IP address explicitly.
The following list identifies the high-level steps and considerations:

* Create a shell script that determines the IP address to use for {product-title} communication. Have the script create a custom unit file such as `/etc/systemd/system/kubelet.service.d/98-nodenet-override.conf`. Use the custom unit file, `98-nodenet-override.conf`, to set the `KUBELET_NODE_IP` environment variable to the IP address.

* Do not overwrite the `/etc/systemd/system/kubelet.service.d/20-nodenet.conf` file. Specify a file name with a numerically higher value such as `98-nodenet-override.conf` in the same directory path. The goal is to have the custom unit file run after `20-nodenet.conf` and override the value of the environment variable.

* Create a machine config object with the shell script as a base64-encoded string and use the Machine Config Operator to deploy the script to the nodes at a file system path such as `/usr/local/bin/override-node-ip.sh`.

* Ensure that `systemctl daemon-reload` runs after the shell script runs. The simplest method is to specify `ExecStart=systemctl daemon-reload` in the machine config, as shown in the following sample.

.Sample machine config to override the network interface for kubelet
[source,yaml,subs="attributes+"]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 98-nodenet-override
spec:
config:
ignition:
version: {ign-config-version}
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,<encoded_script>
mode: 0755
overwrite: true
path: /usr/local/bin/override-node-ip.sh
systemd:
units:
- contents: |
[Unit]
Description=Override node IP detection
Wants=network-online.target
Before=kubelet.service
After=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/override-node-ip.sh
ExecStart=systemctl daemon-reload
[Install]
WantedBy=multi-user.target
enabled: true
name: nodenet-override.service
----
If network communication is disrupted or misconfigured because a different NIC is selected, you might receive the following error: `EtcdCertSignerControllerDegraded`. You can create a hint file that includes the `NODEIP_HINT` variable to override the default IP selection logic. For more information, see Optional: Overriding the default node IP selection logic.

// Link to info for creating a machine config.

Expand Down
97 changes: 97 additions & 0 deletions modules/overriding-default-node-ip-selection-logic.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
// This is included in the following assemblies:
//
// * troubleshooting-network-issues.adoc

:_content-type: PROCEDURE
[id="overriding-default-node-ip-selection-logic_{context}"]
= Optional: Overriding the default node IP selection logic

To override the default IP selection logic, you can create a hint file that includes the `NODEIP_HINT` variable to override the default IP selection logic. Creating a hint file allows you to select a specific node IP address from the interface in the subnet of the IP address specified in the `NODEIP_HINT` variable.

For example, if a node has two interfaces, `eth0` with an address of `10.0.0.10/24`, and `eth1` with an address of `192.0.2.5/24`, and the default route points to `eth0` (`10.0.0.10`),the node IP address would normally use the `10.0.0.10` IP address.

Users can configure the `NODEIP_HINT` variable to point at a known IP in the subnet, for example, a subnet gateway such as `192.0.2.1` so that the other subnet, `192.0.2.0/24`, is selected. As a result, the `192.0.2.5` IP address on `eth1` is used for the node.

The following procedure shows how to override the default node IP selection logic.

.Procedure

. Add a hint file to your your `/etc/default/nodeip-configuration` file, for example:
+
[source,text]
----
NODEIP_HINT=192.0.2.1
----
+
[IMPORTANT]
====
* Do not use the exact IP address of a node as a hint, for example, `192.0.2.5`. Using the exact IP address of a node causes the node using the hint IP address to fail to configure correctly.
* The IP address in the hint file is only used to determine the correct subnet. It will not receive traffic as a result of appearing in the hint file.
====

. Generate the `base-64` encoded content by running the following command:
+
[source,terminal]
----
$ echo 'NODEIP_HINT=192.0.2.1' | base64
----
+
.Example output
+
[source,terminal]
----
Tk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx==
----

. Activate the hint by creating a machine config manifest for both `master` and `worker` roles before deploying the cluster:
+
.Master machine config manifest
[source,yaml]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: 99-nodeip-hint-master
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64, <encoded_content> <1>
mode: 0644
overwrite: true
path: /etc/default/nodeip-configuration
----
+
<1> Replace `<encoded_contents>` with the base64-encoded content of the `/etc/default/nodeip-configuration` file, for example, `Tk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx==`.
+
.Worker machine config manifest
[source,yaml]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-nodeip-hint-worker
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64, <encoded_content> <1>
mode: 0644
overwrite: true
path: /etc/default/nodeip-configuration
----
<1> Replace `<encoded_contents>` with the base64-encoded content of the `/etc/default/nodeip-configuration` file, for example, `Tk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx==`.

. Save the manifest to the directory where you store your cluster configuration, for example, `~/clusterconfigs`.

. Deploy the cluster.
2 changes: 2 additions & 0 deletions support/troubleshooting/troubleshooting-network-issues.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ toc::[]
// How the network interface is selected
include::modules/nw-how-nw-iface-selected.adoc[leveloffset=+1]

include::modules/overriding-default-node-ip-selection-logic.adoc[leveloffset=+2]

// Troubleshooting OVS issues
include::modules/nw-troubleshoot-ovs.adoc[leveloffset=+1]

Expand Down

0 comments on commit 5c0ca0e

Please sign in to comment.