You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 19, 2024. It is now read-only.
Describe the bug
The readiness probe prevents recovery of large databases. If I restart a node from a clean data directory (e.g. after a hard drive failure), it takes a long time for the server to come back up. Particularly the step "Started downloading snapshot for database XXX" can take several minutes (or longer). Unfortunately, this step never finishes, because the readiness probe always kills the container before the download is done.
For now I have set the timeout to one hour via the Helm chart values YAML, but it would be great if there were a more intelligent way to do this, since having a shorter timeout is definitely useful under normal circumstances and I don't always want to reinstall the Helm chart and restart the whole deployment before and after a single-node recovery.
To Reproduce
Steps to reproduce the behavior:
Create a large database
Kill off one of the nodes and delete its hard drive
Try to recover the deleted node
Expected behavior
Recovery should succeed.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Describe the bug
The readiness probe prevents recovery of large databases. If I restart a node from a clean data directory (e.g. after a hard drive failure), it takes a long time for the server to come back up. Particularly the step "Started downloading snapshot for database XXX" can take several minutes (or longer). Unfortunately, this step never finishes, because the readiness probe always kills the container before the download is done.
For now I have set the timeout to one hour via the Helm chart values YAML, but it would be great if there were a more intelligent way to do this, since having a shorter timeout is definitely useful under normal circumstances and I don't always want to reinstall the Helm chart and restart the whole deployment before and after a single-node recovery.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Recovery should succeed.
The text was updated successfully, but these errors were encountered: