Skip to content

Commit

Permalink
Update rwx volume doc
Browse files Browse the repository at this point in the history
Longhorn 6655

Signed-off-by: Derek Su <[email protected]>
  • Loading branch information
derekbit committed Sep 21, 2023
1 parent cc09cfd commit cd1892e
Show file tree
Hide file tree
Showing 9 changed files with 243 additions and 9 deletions.
37 changes: 36 additions & 1 deletion content/docs/1.4.0/advanced-resources/rwx-workloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Longhorn supports ReadWriteMany (RWX) volumes by exposing regular Longhorn volum

# Introduction

For each actively in use RWX volume Longhorn will create a `share-manager-<volume-name>` Pod in the `longhorn-system` namespace. This Pod is responsible for exporting a Longhorn volume via a NFSv4 server that is running inside the Pod. There is also a service created for each RWX volume, and that is used as an endpoint for the actual NFSv4 client connection.
Longhorn creates a dedicated `share-manager-<volume-name>` Pod within the `longhorn-system` namespace for each RWX volume that is currently in active use. The Pod facilitate the export of Longhorn volume via an internally hosted NFSv4 server. Additionally, a corresponding Service is created for each RWX volume, serving as the designated endpoint for actual NFSv4 client connections.

{{< figure src="/img/diagrams/rwx/rwx-arch.png" >}}

Expand All @@ -31,6 +31,41 @@ It is necessary to meet the following requirements in order to use RWX volumes.
> **Tip:** The [environment check script](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh) helps users to check all nodes have unique hostnames.
# Notice
In versions 1.4.0 to 1.4.3 and 1.5.0 to 1.5.1 of Longhorn, Longhorn CSI plugin `hard` mounts a Longhorn volume exported by a NFS server located within a share-manager Pod in the `NodeStageVolume`. The `hard` mount allows NFS requests to persistently retry without termination, ensuring that IOs do not fail. When the server is back online or a replacement server is recreated, the IOs resume seamlessly, thus guaranteeing data integrity. However, there is potential risk that to maintain file system stability, the Linux kernel will not permit unmounting a file system until all pending IOs are written back to storage, and the system cannot undergo shutdown until all file systems are unmounted. If the NFS server fails to recover successfully, the client nodes must undergo a forced reboot.
To mitigate the issue,
- For existing volumes
Due to the immutability of `persistentvolume.spec.csi.volumeAttributes`, making adjustments to the `nfsOptions` within this field is prohibited. To address this issue, upgrading to `v1.4.4` or a newer version is required. After upgrading the Longhorn system, the `softerr` or `soft` will be automatically applied when reattaching RWX volumes.
- For new volumes
Users can switch to use `softerr` or `soft` mount instead. Users can achieve this by creating a StorageClass with `nfsOptions` for newly created volumes. For instance:
```
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-test
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: "ext4"
nfsOptions: "vers=4.1,noresvport,softerr,timeo=600,retrans=5"
``````
Then, create RWX volume PVCs using the above the StorageClass. Longhorn will then adopt the use of a `softerr` or `soft` mount with a `timeo` value of 600 and a `retrans` value of 5 as default options. When the NFS server becomes unreachable due to factors such as node power outages, network partitions and so on, NFS clients will fail an NFS request after the specified number of retransmissions, resulting in an NFS `ETIMEDOUT` error (or EIO error for `soft` mount) being returned to the calling application and potential data loss.
Please refer to [#6655](https://github.com/longhorn/longhorn/issues/6655) for more information.
# Creation and Usage of a RWX Volume
1. For dynamically provisioned Longhorn volumes, the access mode is based on the PVC's access mode.
Expand Down
37 changes: 36 additions & 1 deletion content/docs/1.4.1/advanced-resources/rwx-workloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Longhorn supports ReadWriteMany (RWX) volumes by exposing regular Longhorn volum

# Introduction

For each actively in use RWX volume Longhorn will create a `share-manager-<volume-name>` Pod in the `longhorn-system` namespace. This Pod is responsible for exporting a Longhorn volume via a NFSv4 server that is running inside the Pod. There is also a service created for each RWX volume, and that is used as an endpoint for the actual NFSv4 client connection.
Longhorn creates a dedicated `share-manager-<volume-name>` Pod within the `longhorn-system` namespace for each RWX volume that is currently in active use. The Pod facilitate the export of Longhorn volume via an internally hosted NFSv4 server. Additionally, a corresponding Service is created for each RWX volume, serving as the designated endpoint for actual NFSv4 client connections.

{{< figure src="/img/diagrams/rwx/rwx-arch.png" >}}

Expand All @@ -31,6 +31,41 @@ It is necessary to meet the following requirements in order to use RWX volumes.
> **Tip:** The [environment check script](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh) helps users to check all nodes have unique hostnames.
# Notice
In versions 1.4.0 to 1.4.3 and 1.5.0 to 1.5.1 of Longhorn, Longhorn CSI plugin `hard` mounts a Longhorn volume exported by a NFS server located within a share-manager Pod in the `NodeStageVolume`. The `hard` mount allows NFS requests to persistently retry without termination, ensuring that IOs do not fail. When the server is back online or a replacement server is recreated, the IOs resume seamlessly, thus guaranteeing data integrity. However, there is potential risk that to maintain file system stability, the Linux kernel will not permit unmounting a file system until all pending IOs are written back to storage, and the system cannot undergo shutdown until all file systems are unmounted. If the NFS server fails to recover successfully, the client nodes must undergo a forced reboot.s
To mitigate the issue,
- For existing volumes
Due to the immutability of `persistentvolume.spec.csi.volumeAttributes`, making adjustments to the `nfsOptions` within this field is prohibited. To address this issue, upgrading to `v1.4.4` or a newer version is required. After upgrading the Longhorn system, the `softerr` or `soft` will be automatically applied when reattaching RWX volumes.
- For new volumes
Users can switch to use `softerr` or `soft` mount instead. Users can achieve this by creating a StorageClass with `nfsOptions` for newly created volumes. For instance:
```
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-test
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: "ext4"
nfsOptions: "vers=4.1,noresvport,softerr,timeo=600,retrans=5"
``````
Then, create RWX volume PVCs using the above the StorageClass. Longhorn will then adopt the use of a `softerr` or `soft` mount with a `timeo` value of 600 and a `retrans` value of 5 as default options. When the NFS server becomes unreachable due to factors such as node power outages, network partitions and so on, NFS clients will fail an NFS request after the specified number of retransmissions, resulting in an NFS `ETIMEDOUT` error (or EIO error for `soft` mount) being returned to the calling application and potential data loss.
Please refer to [#6655](https://github.com/longhorn/longhorn/issues/6655) for more information.
# Creation and Usage of a RWX Volume
1. For dynamically provisioned Longhorn volumes, the access mode is based on the PVC's access mode.
Expand Down
37 changes: 36 additions & 1 deletion content/docs/1.4.2/advanced-resources/rwx-workloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Longhorn supports ReadWriteMany (RWX) volumes by exposing regular Longhorn volum

# Introduction

For each actively in use RWX volume Longhorn will create a `share-manager-<volume-name>` Pod in the `longhorn-system` namespace. This Pod is responsible for exporting a Longhorn volume via a NFSv4 server that is running inside the Pod. There is also a service created for each RWX volume, and that is used as an endpoint for the actual NFSv4 client connection.
Longhorn creates a dedicated `share-manager-<volume-name>` Pod within the `longhorn-system` namespace for each RWX volume that is currently in active use. The Pod facilitate the export of Longhorn volume via an internally hosted NFSv4 server. Additionally, a corresponding Service is created for each RWX volume, serving as the designated endpoint for actual NFSv4 client connections.

{{< figure src="/img/diagrams/rwx/rwx-arch.png" >}}

Expand All @@ -31,6 +31,41 @@ It is necessary to meet the following requirements in order to use RWX volumes.
> **Tip:** The [environment check script](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh) helps users to check all nodes have unique hostnames.
# Notice
In versions 1.4.0 to 1.4.3 and 1.5.0 to 1.5.1 of Longhorn, Longhorn CSI plugin `hard` mounts a Longhorn volume exported by a NFS server located within a share-manager Pod in the `NodeStageVolume`. The `hard` mount allows NFS requests to persistently retry without termination, ensuring that IOs do not fail. When the server is back online or a replacement server is recreated, the IOs resume seamlessly, thus guaranteeing data integrity. However, there is potential risk that to maintain file system stability, the Linux kernel will not permit unmounting a file system until all pending IOs are written back to storage, and the system cannot undergo shutdown until all file systems are unmounted. If the NFS server fails to recover successfully, the client nodes must undergo a forced reboot.
To mitigate the issue,
- For existing volumes
Due to the immutability of `persistentvolume.spec.csi.volumeAttributes`, making adjustments to the `nfsOptions` within this field is prohibited. To address this issue, upgrading to `v1.4.4` or a newer version is required. After upgrading the Longhorn system, the `softerr` or `soft` will be automatically applied when reattaching RWX volumes.
- For new volumes
Users can switch to use `softerr` or `soft` mount instead. Users can achieve this by creating a StorageClass with `nfsOptions` for newly created volumes. For instance:
```
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-test
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: "ext4"
nfsOptions: "vers=4.1,noresvport,softerr,timeo=600,retrans=5"
``````
Then, create RWX volume PVCs using the above the StorageClass. Longhorn will then adopt the use of a `softerr` or `soft` mount with a `timeo` value of 600 and a `retrans` value of 5 as default options. When the NFS server becomes unreachable due to factors such as node power outages, network partitions and so on, NFS clients will fail an NFS request after the specified number of retransmissions, resulting in an NFS `ETIMEDOUT` error (or EIO error for `soft` mount) being returned to the calling application and potential data loss.
Please refer to [#6655](https://github.com/longhorn/longhorn/issues/6655) for more information.
# Creation and Usage of a RWX Volume
1. For dynamically provisioned Longhorn volumes, the access mode is based on the PVC's access mode.
Expand Down
37 changes: 36 additions & 1 deletion content/docs/1.4.3/advanced-resources/rwx-workloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Longhorn supports ReadWriteMany (RWX) volumes by exposing regular Longhorn volum

# Introduction

For each actively in use RWX volume Longhorn will create a `share-manager-<volume-name>` Pod in the `longhorn-system` namespace. This Pod is responsible for exporting a Longhorn volume via a NFSv4 server that is running inside the Pod. There is also a service created for each RWX volume, and that is used as an endpoint for the actual NFSv4 client connection.
Longhorn creates a dedicated `share-manager-<volume-name>` Pod within the `longhorn-system` namespace for each RWX volume that is currently in active use. The Pod facilitate the export of Longhorn volume via an internally hosted NFSv4 server. Additionally, a corresponding Service is created for each RWX volume, serving as the designated endpoint for actual NFSv4 client connections.

{{< figure src="/img/diagrams/rwx/rwx-arch.png" >}}

Expand All @@ -31,6 +31,41 @@ It is necessary to meet the following requirements in order to use RWX volumes.
> **Tip:** The [environment check script](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh) helps users to check all nodes have unique hostnames.
# Notice
In versions 1.4.0 to 1.4.3 and 1.5.0 to 1.5.1 of Longhorn, Longhorn CSI plugin `hard` mounts a Longhorn volume exported by a NFS server located within a share-manager Pod in the `NodeStageVolume`. The `hard` mount allows NFS requests to persistently retry without termination, ensuring that IOs do not fail. When the server is back online or a replacement server is recreated, the IOs resume seamlessly, thus guaranteeing data integrity. However, there is potential risk that to maintain file system stability, the Linux kernel will not permit unmounting a file system until all pending IOs are written back to storage, and the system cannot undergo shutdown until all file systems are unmounted. If the NFS server fails to recover successfully, the client nodes must undergo a forced reboot.
To mitigate the issue,
- For existing volumes
Due to the immutability of `persistentvolume.spec.csi.volumeAttributes`, making adjustments to the `nfsOptions` within this field is prohibited. To address this issue, upgrading to `v1.4.4` or a newer version is required. After upgrading the Longhorn system, the `softerr` or `soft` will be automatically applied when reattaching RWX volumes.
- For new volumes
Users can switch to use `softerr` or `soft` mount instead. Users can achieve this by creating a StorageClass with `nfsOptions` for newly created volumes. For instance:
```
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-test
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: "ext4"
nfsOptions: "vers=4.1,noresvport,softerr,timeo=600,retrans=5"
``````
Then, create RWX volume PVCs using the above the StorageClass. Longhorn will then adopt the use of a `softerr` or `soft` mount with a `timeo` value of 600 and a `retrans` value of 5 as default options. When the NFS server becomes unreachable due to factors such as node power outages, network partitions and so on, NFS clients will fail an NFS request after the specified number of retransmissions, resulting in an NFS `ETIMEDOUT` error (or EIO error for `soft` mount) being returned to the calling application and potential data loss.
Please refer to [#6655](https://github.com/longhorn/longhorn/issues/6655) for more information.
# Creation and Usage of a RWX Volume
1. For dynamically provisioned Longhorn volumes, the access mode is based on the PVC's access mode.
Expand Down
10 changes: 9 additions & 1 deletion content/docs/1.4.4/advanced-resources/rwx-workloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Longhorn supports ReadWriteMany (RWX) volumes by exposing regular Longhorn volum

# Introduction

For each actively in use RWX volume Longhorn will create a `share-manager-<volume-name>` Pod in the `longhorn-system` namespace. This Pod is responsible for exporting a Longhorn volume via a NFSv4 server that is running inside the Pod. There is also a service created for each RWX volume, and that is used as an endpoint for the actual NFSv4 client connection.
Longhorn creates a dedicated `share-manager-<volume-name>` Pod within the `longhorn-system` namespace for each RWX volume that is currently in active use. The Pod facilitate the export of Longhorn volume via an internally hosted NFSv4 server. Additionally, a corresponding Service is created for each RWX volume, serving as the designated endpoint for actual NFSv4 client connections.

{{< figure src="/img/diagrams/rwx/rwx-arch.png" >}}

Expand All @@ -31,6 +31,14 @@ It is necessary to meet the following requirements in order to use RWX volumes.
> **Tip:** The [environment check script](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh) helps users to check all nodes have unique hostnames.
# Notice
In versions 1.4.0 to 1.4.3 and 1.5.0 to 1.5.1 of Longhorn, Longhorn CSI plugin `hard` mounts a Longhorn volume exported by a NFS server located within a share-manager Pod in the `NodeStageVolume`. The `hard` mount allows NFS requests to persistently retry without termination, ensuring that IOs do not fail. When the server is back online or a replacement server is recreated, the IOs resume seamlessly, thus guaranteeing data integrity. However, there is potential risk that to maintain file system stability, the Linux kernel will not permit unmounting a file system until all pending IOs are written back to storage, and the system cannot undergo shutdown until all file systems are unmounted. If the NFS server fails to recover successfully, the client nodes must undergo a forced reboot.
To address this stability problem, Longhorn switches to adopt the use of a `softerr` mount with a `timeo` value of 600 and a `retrans` value of 5 as default options since v1.4.4, 1.5.2 and 1.6.0. When the NFS server becomes unreachable due to factors such as node power outages, network partitions and so on, NFS clients will fail an NFS request after the specified number of retransmissions, resulting in an NFS `ETIMEDOUT` error being returned to the calling application and potential data loss. If `softerr` is not supported, Longhorn will automatically revert to using the `soft` option instead.
Please refer to [#6655](https://github.com/longhorn/longhorn/issues/6655) for more information.
# Creation and Usage of a RWX Volume
1. For dynamically provisioned Longhorn volumes, the access mode is based on the PVC's access mode.
Expand Down
Loading

0 comments on commit cd1892e

Please sign in to comment.