Update rwx volume doc

Longhorn 6655 Signed-off-by: Derek Su <[email protected]>
longhorn · Sep 21, 2023 · cd1892e · cd1892e
1 parent cc09cfd
commit cd1892e
Show file tree

Hide file tree

Showing 9 changed files with 243 additions and 9 deletions.
diff --git a/content/docs/1.4.0/advanced-resources/rwx-workloads.md b/content/docs/1.4.0/advanced-resources/rwx-workloads.md
@@ -8,7 +8,7 @@ Longhorn supports ReadWriteMany (RWX) volumes by exposing regular Longhorn volum
 
 # Introduction
 
-For each actively in use RWX volume Longhorn will create a `share-manager-<volume-name>` Pod in the `longhorn-system` namespace. This Pod is responsible for exporting a Longhorn volume via a NFSv4 server that is running inside the Pod. There is also a service created for each RWX volume, and that is used as an endpoint for the actual NFSv4 client connection.
+Longhorn creates a dedicated `share-manager-<volume-name>` Pod within the `longhorn-system` namespace for each RWX volume that is currently in active use. The Pod facilitate the export of Longhorn volume via an internally hosted NFSv4 server. Additionally, a corresponding Service is created for each RWX volume, serving as the designated endpoint for actual NFSv4 client connections.
 
 {{< figure src="/img/diagrams/rwx/rwx-arch.png" >}}
 
@@ -31,6 +31,41 @@ It is necessary to meet the following requirements in order to use RWX volumes.
     
     > **Tip:** The [environment check script](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh) helps users to check all nodes have unique hostnames.
 
+# Notice
+
+In versions 1.4.0 to 1.4.3 and 1.5.0 to 1.5.1 of Longhorn, Longhorn CSI plugin `hard` mounts a Longhorn volume exported by a NFS server located within a share-manager Pod in the `NodeStageVolume`. The `hard` mount allows NFS requests to persistently retry without termination, ensuring that IOs do not fail. When the server is back online or a replacement server is recreated, the IOs resume seamlessly, thus guaranteeing data integrity. However, there is potential risk that to maintain file system stability, the Linux kernel will not permit unmounting a file system until all pending IOs are written back to storage, and the system cannot undergo shutdown until all file systems are unmounted. If the NFS server fails to recover successfully, the client nodes must undergo a forced reboot.
+
+To mitigate the issue,
+
+- For existing volumes
+
+  Due to the immutability of `persistentvolume.spec.csi.volumeAttributes`, making adjustments to the `nfsOptions` within this field is prohibited. To address this issue, upgrading to `v1.4.4` or a newer version is required. After upgrading the Longhorn system, the `softerr` or `soft` will be automatically applied when reattaching RWX volumes.
+
+- For new volumes
+
+  Users can switch to use `softerr` or `soft` mount instead. Users can achieve this by creating a StorageClass with `nfsOptions` for newly created volumes. For instance:
+
+  ```
+  kind: StorageClass
+  apiVersion: storage.k8s.io/v1
+  metadata:
+    name: longhorn-test
+  provisioner: driver.longhorn.io
+  allowVolumeExpansion: true
+  reclaimPolicy: Delete
+  volumeBindingMode: Immediate
+  parameters:
+    numberOfReplicas: "3"
+    staleReplicaTimeout: "2880"
+    fromBackup: ""
+    fsType: "ext4"
+    nfsOptions: "vers=4.1,noresvport,softerr,timeo=600,retrans=5"
+  ``````
+
+  Then, create RWX volume PVCs using the above the StorageClass. Longhorn will then adopt the use of a `softerr` or `soft` mount with a `timeo` value of 600 and a `retrans` value of 5 as default options. When the NFS server becomes unreachable due to factors such as node power outages, network partitions and so on, NFS clients will fail an NFS request after the specified number of retransmissions, resulting in an NFS `ETIMEDOUT` error (or EIO error for `soft` mount) being returned to the calling application and potential data loss.
+
+  Please refer to [#6655](https://github.com/longhorn/longhorn/issues/6655) for more information.
+
 # Creation and Usage of a RWX Volume
 
 1. For dynamically provisioned Longhorn volumes, the access mode is based on the PVC's access mode.

diff --git a/content/docs/1.4.1/advanced-resources/rwx-workloads.md b/content/docs/1.4.1/advanced-resources/rwx-workloads.md
@@ -8,7 +8,7 @@ Longhorn supports ReadWriteMany (RWX) volumes by exposing regular Longhorn volum
 
 # Introduction
 
-For each actively in use RWX volume Longhorn will create a `share-manager-<volume-name>` Pod in the `longhorn-system` namespace. This Pod is responsible for exporting a Longhorn volume via a NFSv4 server that is running inside the Pod. There is also a service created for each RWX volume, and that is used as an endpoint for the actual NFSv4 client connection.
+Longhorn creates a dedicated `share-manager-<volume-name>` Pod within the `longhorn-system` namespace for each RWX volume that is currently in active use. The Pod facilitate the export of Longhorn volume via an internally hosted NFSv4 server. Additionally, a corresponding Service is created for each RWX volume, serving as the designated endpoint for actual NFSv4 client connections.
 
 {{< figure src="/img/diagrams/rwx/rwx-arch.png" >}}
 
@@ -31,6 +31,41 @@ It is necessary to meet the following requirements in order to use RWX volumes.
     
     > **Tip:** The [environment check script](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh) helps users to check all nodes have unique hostnames.
 
+# Notice
+
+In versions 1.4.0 to 1.4.3 and 1.5.0 to 1.5.1 of Longhorn, Longhorn CSI plugin `hard` mounts a Longhorn volume exported by a NFS server located within a share-manager Pod in the `NodeStageVolume`. The `hard` mount allows NFS requests to persistently retry without termination, ensuring that IOs do not fail. When the server is back online or a replacement server is recreated, the IOs resume seamlessly, thus guaranteeing data integrity. However, there is potential risk that to maintain file system stability, the Linux kernel will not permit unmounting a file system until all pending IOs are written back to storage, and the system cannot undergo shutdown until all file systems are unmounted. If the NFS server fails to recover successfully, the client nodes must undergo a forced reboot.s
+
+To mitigate the issue,
+
+- For existing volumes
+
+  Due to the immutability of `persistentvolume.spec.csi.volumeAttributes`, making adjustments to the `nfsOptions` within this field is prohibited. To address this issue, upgrading to `v1.4.4` or a newer version is required. After upgrading the Longhorn system, the `softerr` or `soft` will be automatically applied when reattaching RWX volumes.
+
+- For new volumes
+
+  Users can switch to use `softerr` or `soft` mount instead. Users can achieve this by creating a StorageClass with `nfsOptions` for newly created volumes. For instance:
+
+  ```
+  kind: StorageClass
+  apiVersion: storage.k8s.io/v1
+  metadata:
+    name: longhorn-test
+  provisioner: driver.longhorn.io
+  allowVolumeExpansion: true
+  reclaimPolicy: Delete
+  volumeBindingMode: Immediate
+  parameters:
+    numberOfReplicas: "3"
+    staleReplicaTimeout: "2880"
+    fromBackup: ""
+    fsType: "ext4"
+    nfsOptions: "vers=4.1,noresvport,softerr,timeo=600,retrans=5"
+  ``````
+
+  Then, create RWX volume PVCs using the above the StorageClass. Longhorn will then adopt the use of a `softerr` or `soft` mount with a `timeo` value of 600 and a `retrans` value of 5 as default options. When the NFS server becomes unreachable due to factors such as node power outages, network partitions and so on, NFS clients will fail an NFS request after the specified number of retransmissions, resulting in an NFS `ETIMEDOUT` error (or EIO error for `soft` mount) being returned to the calling application and potential data loss.
+
+  Please refer to [#6655](https://github.com/longhorn/longhorn/issues/6655) for more information.
+
 # Creation and Usage of a RWX Volume
 
 1. For dynamically provisioned Longhorn volumes, the access mode is based on the PVC's access mode.

diff --git a/content/docs/1.4.2/advanced-resources/rwx-workloads.md b/content/docs/1.4.2/advanced-resources/rwx-workloads.md
@@ -8,7 +8,7 @@ Longhorn supports ReadWriteMany (RWX) volumes by exposing regular Longhorn volum
 
 # Introduction
 
-For each actively in use RWX volume Longhorn will create a `share-manager-<volume-name>` Pod in the `longhorn-system` namespace. This Pod is responsible for exporting a Longhorn volume via a NFSv4 server that is running inside the Pod. There is also a service created for each RWX volume, and that is used as an endpoint for the actual NFSv4 client connection.
+Longhorn creates a dedicated `share-manager-<volume-name>` Pod within the `longhorn-system` namespace for each RWX volume that is currently in active use. The Pod facilitate the export of Longhorn volume via an internally hosted NFSv4 server. Additionally, a corresponding Service is created for each RWX volume, serving as the designated endpoint for actual NFSv4 client connections.
 
 {{< figure src="/img/diagrams/rwx/rwx-arch.png" >}}
 
@@ -31,6 +31,41 @@ It is necessary to meet the following requirements in order to use RWX volumes.
     
     > **Tip:** The [environment check script](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh) helps users to check all nodes have unique hostnames.
 
+# Notice
+
+In versions 1.4.0 to 1.4.3 and 1.5.0 to 1.5.1 of Longhorn, Longhorn CSI plugin `hard` mounts a Longhorn volume exported by a NFS server located within a share-manager Pod in the `NodeStageVolume`. The `hard` mount allows NFS requests to persistently retry without termination, ensuring that IOs do not fail. When the server is back online or a replacement server is recreated, the IOs resume seamlessly, thus guaranteeing data integrity. However, there is potential risk that to maintain file system stability, the Linux kernel will not permit unmounting a file system until all pending IOs are written back to storage, and the system cannot undergo shutdown until all file systems are unmounted. If the NFS server fails to recover successfully, the client nodes must undergo a forced reboot.
+
+To mitigate the issue,
+
+- For existing volumes
+
+  Due to the immutability of `persistentvolume.spec.csi.volumeAttributes`, making adjustments to the `nfsOptions` within this field is prohibited. To address this issue, upgrading to `v1.4.4` or a newer version is required. After upgrading the Longhorn system, the `softerr` or `soft` will be automatically applied when reattaching RWX volumes.
+
+- For new volumes
+
+  Users can switch to use `softerr` or `soft` mount instead. Users can achieve this by creating a StorageClass with `nfsOptions` for newly created volumes. For instance:
+
+  ```
+  kind: StorageClass
+  apiVersion: storage.k8s.io/v1
+  metadata:
+    name: longhorn-test
+  provisioner: driver.longhorn.io
+  allowVolumeExpansion: true
+  reclaimPolicy: Delete
+  volumeBindingMode: Immediate
+  parameters:
+    numberOfReplicas: "3"
+    staleReplicaTimeout: "2880"
+    fromBackup: ""
+    fsType: "ext4"
+    nfsOptions: "vers=4.1,noresvport,softerr,timeo=600,retrans=5"
+  ``````
+
+  Then, create RWX volume PVCs using the above the StorageClass. Longhorn will then adopt the use of a `softerr` or `soft` mount with a `timeo` value of 600 and a `retrans` value of 5 as default options. When the NFS server becomes unreachable due to factors such as node power outages, network partitions and so on, NFS clients will fail an NFS request after the specified number of retransmissions, resulting in an NFS `ETIMEDOUT` error (or EIO error for `soft` mount) being returned to the calling application and potential data loss.
+
+  Please refer to [#6655](https://github.com/longhorn/longhorn/issues/6655) for more information.
+
 # Creation and Usage of a RWX Volume
 
 1. For dynamically provisioned Longhorn volumes, the access mode is based on the PVC's access mode.

diff --git a/content/docs/1.4.3/advanced-resources/rwx-workloads.md b/content/docs/1.4.3/advanced-resources/rwx-workloads.md
@@ -8,7 +8,7 @@ Longhorn supports ReadWriteMany (RWX) volumes by exposing regular Longhorn volum
 
 # Introduction
 
-For each actively in use RWX volume Longhorn will create a `share-manager-<volume-name>` Pod in the `longhorn-system` namespace. This Pod is responsible for exporting a Longhorn volume via a NFSv4 server that is running inside the Pod. There is also a service created for each RWX volume, and that is used as an endpoint for the actual NFSv4 client connection.
+Longhorn creates a dedicated `share-manager-<volume-name>` Pod within the `longhorn-system` namespace for each RWX volume that is currently in active use. The Pod facilitate the export of Longhorn volume via an internally hosted NFSv4 server. Additionally, a corresponding Service is created for each RWX volume, serving as the designated endpoint for actual NFSv4 client connections.
 
 {{< figure src="/img/diagrams/rwx/rwx-arch.png" >}}
 
@@ -31,6 +31,41 @@ It is necessary to meet the following requirements in order to use RWX volumes.
     
     > **Tip:** The [environment check script](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh) helps users to check all nodes have unique hostnames.
 
+# Notice
+
+In versions 1.4.0 to 1.4.3 and 1.5.0 to 1.5.1 of Longhorn, Longhorn CSI plugin `hard` mounts a Longhorn volume exported by a NFS server located within a share-manager Pod in the `NodeStageVolume`. The `hard` mount allows NFS requests to persistently retry without termination, ensuring that IOs do not fail. When the server is back online or a replacement server is recreated, the IOs resume seamlessly, thus guaranteeing data integrity. However, there is potential risk that to maintain file system stability, the Linux kernel will not permit unmounting a file system until all pending IOs are written back to storage, and the system cannot undergo shutdown until all file systems are unmounted. If the NFS server fails to recover successfully, the client nodes must undergo a forced reboot.
+
+To mitigate the issue,
+
+- For existing volumes
+
+  Due to the immutability of `persistentvolume.spec.csi.volumeAttributes`, making adjustments to the `nfsOptions` within this field is prohibited. To address this issue, upgrading to `v1.4.4` or a newer version is required. After upgrading the Longhorn system, the `softerr` or `soft` will be automatically applied when reattaching RWX volumes.
+
+- For new volumes
+
+  Users can switch to use `softerr` or `soft` mount instead. Users can achieve this by creating a StorageClass with `nfsOptions` for newly created volumes. For instance:
+
+  ```
+  kind: StorageClass
+  apiVersion: storage.k8s.io/v1
+  metadata:
+    name: longhorn-test
+  provisioner: driver.longhorn.io
+  allowVolumeExpansion: true
+  reclaimPolicy: Delete
+  volumeBindingMode: Immediate
+  parameters:
+    numberOfReplicas: "3"
+    staleReplicaTimeout: "2880"
+    fromBackup: ""
+    fsType: "ext4"
+    nfsOptions: "vers=4.1,noresvport,softerr,timeo=600,retrans=5"
+  ``````
+
+  Then, create RWX volume PVCs using the above the StorageClass. Longhorn will then adopt the use of a `softerr` or `soft` mount with a `timeo` value of 600 and a `retrans` value of 5 as default options. When the NFS server becomes unreachable due to factors such as node power outages, network partitions and so on, NFS clients will fail an NFS request after the specified number of retransmissions, resulting in an NFS `ETIMEDOUT` error (or EIO error for `soft` mount) being returned to the calling application and potential data loss.
+
+  Please refer to [#6655](https://github.com/longhorn/longhorn/issues/6655) for more information.
+
 # Creation and Usage of a RWX Volume
 
 1. For dynamically provisioned Longhorn volumes, the access mode is based on the PVC's access mode.

diff --git a/content/docs/1.4.4/advanced-resources/rwx-workloads.md b/content/docs/1.4.4/advanced-resources/rwx-workloads.md
@@ -8,7 +8,7 @@ Longhorn supports ReadWriteMany (RWX) volumes by exposing regular Longhorn volum
 
 # Introduction
 
-For each actively in use RWX volume Longhorn will create a `share-manager-<volume-name>` Pod in the `longhorn-system` namespace. This Pod is responsible for exporting a Longhorn volume via a NFSv4 server that is running inside the Pod. There is also a service created for each RWX volume, and that is used as an endpoint for the actual NFSv4 client connection.
+Longhorn creates a dedicated `share-manager-<volume-name>` Pod within the `longhorn-system` namespace for each RWX volume that is currently in active use. The Pod facilitate the export of Longhorn volume via an internally hosted NFSv4 server. Additionally, a corresponding Service is created for each RWX volume, serving as the designated endpoint for actual NFSv4 client connections.
 
 {{< figure src="/img/diagrams/rwx/rwx-arch.png" >}}
 
@@ -31,6 +31,14 @@ It is necessary to meet the following requirements in order to use RWX volumes.
     
     > **Tip:** The [environment check script](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh) helps users to check all nodes have unique hostnames.
 
+# Notice
+
+In versions 1.4.0 to 1.4.3 and 1.5.0 to 1.5.1 of Longhorn, Longhorn CSI plugin `hard` mounts a Longhorn volume exported by a NFS server located within a share-manager Pod in the `NodeStageVolume`. The `hard` mount allows NFS requests to persistently retry without termination, ensuring that IOs do not fail. When the server is back online or a replacement server is recreated, the IOs resume seamlessly, thus guaranteeing data integrity. However, there is potential risk that to maintain file system stability, the Linux kernel will not permit unmounting a file system until all pending IOs are written back to storage, and the system cannot undergo shutdown until all file systems are unmounted. If the NFS server fails to recover successfully, the client nodes must undergo a forced reboot.
+
+To address this stability problem, Longhorn switches to adopt the use of a `softerr` mount with a `timeo` value of 600 and a `retrans` value of 5 as default options since v1.4.4, 1.5.2 and 1.6.0. When the NFS server becomes unreachable due to factors such as node power outages, network partitions and so on, NFS clients will fail an NFS request after the specified number of retransmissions, resulting in an NFS `ETIMEDOUT` error being returned to the calling application and potential data loss. If `softerr` is not supported, Longhorn will automatically revert to using the `soft` option instead.
+
+Please refer to [#6655](https://github.com/longhorn/longhorn/issues/6655) for more information.
+
 # Creation and Usage of a RWX Volume
 
 1. For dynamically provisioned Longhorn volumes, the access mode is based on the PVC's access mode.