Skip to content

Commit

Permalink
docs: add last replica timeout behavior to important notes
Browse files Browse the repository at this point in the history
Longhorn 8711

Signed-off-by: Eric Weber <[email protected]>
  • Loading branch information
ejweber authored and derekbit committed Sep 3, 2024
1 parent 251f47d commit 15f8f3d
Show file tree
Hide file tree
Showing 4 changed files with 20 additions and 6 deletions.
7 changes: 7 additions & 0 deletions content/docs/1.7.1/important-notes/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Please see [here](https://github.com/longhorn/longhorn/releases/tag/v{{< current
- [Resilience](#resilience)
- [RWX Volumes Fast Failover](#rwx-volumes-fast-failover)
- [Timeout Configuration for Replica Rebuilding and Snapshot Cloning](#timeout-configuration-for-replica-rebuilding-and-snapshot-cloning)
- [Change in Engine Replica Timeout Behavior](#change-in-engine-replica-timeout-behavior)
- [Data Integrity and Reliability](#data-integrity-and-reliability)
- [Support Periodic and On-Demand Full Backups to Enhance Backup Reliability](#support-periodic-and-on-demand-full-backups-to-enhance-backup-reliability)
- [High Availability of Backing Images](#high-availability-of-backing-images)
Expand Down Expand Up @@ -159,6 +160,12 @@ RWX Volumes fast failover is introduced in Longhorn v1.7.0 to improve resilience
Starting with v1.7.0, Longhorn supports configuration of timeouts for replica rebuilding and snapshot cloning. Before v1.7.0, the replica rebuilding timeout was capped at 24 hours, which could cause failures for large volumes in slow bandwidth environments. The default timeout is still 24 hours but you can adjust it to accommodate different environments. For more information, see [Long gRPC Timeout](../references/settings/#long-grpc-timeout).
### Change in Engine Replica Timeout Behavior
In versions earlier than v1.7.1, the [Engine Replica Timeout](../references/settings#engine-replica-timeout) setting
was equally applied to all V1 volume replicas. In v1.7.1, a V1 engine marks the last active replica as failed only after
twice the configured number of seconds (timeout value x 2) have passed.
## Data Integrity and Reliability
### Support Periodic and On-Demand Full Backups to Enhance Backup Reliability
Expand Down
6 changes: 3 additions & 3 deletions content/docs/1.7.1/references/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -424,9 +424,9 @@ The default minimum number of backing image copies Longhorn maintains.
The time in seconds a v1 engine will wait for a response from a replica before marking it as failed. Values between 8
and 30 are allowed. The engine replica timeout is only in effect while there are I/O requests outstanding.

This timeout only applies as-configured to additional replicas. A v1 engine will not mark the final replica for a
running volume as failed until twice the configured timeout. This behavior is intended to balance volume responsiveness
with volume availability:
This setting only applies to additional replicas. A V1 engine marks the last active replica as failed only after twice
the configured number of seconds (timeout value x 2) have passed. This behavior is intended to balance volume
responsiveness with volume availability.

- The engine can quickly (after the configured timeout) ignore individual replicas that become unresponsive in favor of
other available ones. This ensures future I/O will not be held up.
Expand Down
7 changes: 7 additions & 0 deletions content/docs/1.8.0/important-notes/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Please see [here](https://github.com/longhorn/longhorn/releases/tag/v{{< current
- [Resilience](#resilience)
- [RWX Volumes Fast Failover](#rwx-volumes-fast-failover)
- [Timeout Configuration for Replica Rebuilding and Snapshot Cloning](#timeout-configuration-for-replica-rebuilding-and-snapshot-cloning)
- [Change in Engine Replica Timeout Behavior](#change-in-engine-replica-timeout-behavior)
- [Data Integrity and Reliability](#data-integrity-and-reliability)
- [Support Periodic and On-Demand Full Backups to Enhance Backup Reliability](#support-periodic-and-on-demand-full-backups-to-enhance-backup-reliability)
- [High Availability of Backing Images](#high-availability-of-backing-images)
Expand Down Expand Up @@ -161,6 +162,12 @@ RWX Volumes fast failover is introduced in Longhorn v1.7.0 to improve resilience
Starting with v1.7.0, Longhorn supports configuration of timeouts for replica rebuilding and snapshot cloning. Before v1.7.0, the replica rebuilding timeout was capped at 24 hours, which could cause failures for large volumes in slow bandwidth environments. The default timeout is still 24 hours but you can adjust it to accommodate different environments. For more information, see [Long gRPC Timeout](../references/settings/#long-grpc-timeout).
### Change in Engine Replica Timeout Behavior
In versions earlier than v1.8.0, the [Engine Replica Timeout](../references/settings#engine-replica-timeout) setting
was equally applied to all V1 volume replicas. In v1.8.0, a V1 engine marks the last active replica as failed only after
twice the configured number of seconds (timeout value x 2) have passed.
## Data Integrity and Reliability
### Support Periodic and On-Demand Full Backups to Enhance Backup Reliability
Expand Down
6 changes: 3 additions & 3 deletions content/docs/1.8.0/references/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -424,9 +424,9 @@ The default minimum number of backing image copies Longhorn maintains.
The time in seconds a v1 engine will wait for a response from a replica before marking it as failed. Values between 8
and 30 are allowed. The engine replica timeout is only in effect while there are I/O requests outstanding.

This timeout only applies as-configured to additional replicas. A v1 engine will not mark the final replica for a
running volume as failed until twice the configured timeout. This behavior is intended to balance volume responsiveness
with volume availability:
This setting only applies to additional replicas. A V1 engine marks the last active replica as failed only after twice
the configured number of seconds (timeout value x 2) have passed. This behavior is intended to balance volume
responsiveness with volume availability.

- The engine can quickly (after the configured timeout) ignore individual replicas that become unresponsive in favor of
other available ones. This ensures future I/O will not be held up.
Expand Down

0 comments on commit 15f8f3d

Please sign in to comment.