Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry wait for stable service in deploy release #761

Merged
merged 3 commits into from
Oct 1, 2024
Merged

Conversation

KevinJBoyer
Copy link
Contributor

Ticket

n/a

Changes

  • If waiting for a stable ECS service fails during deploy, try it exactly one more time

Context for reviewers

  • For two applications using the template-infra, the Nava Labs Decision Support Tool project, and an internal Nava tool, the ECS service takes slightly more than 10 minutes to become stable (typically about 11 or 13).
  • The AWS wait command can't be configured to allow more than 10 minutes
  • Other approaches considered:
    • Sleeping. This is probably the simplest solution but doesn't seem as robust as simply trying the command twice.
    • Retrying a configurable number of times in a loop. This seems like premature complexity.

Testing

Tested on internal tool (posted in Slack)

Copy link
Contributor

@lorenyu lorenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for the improvement!

@@ -25,6 +25,9 @@ echo "::endgroup::"
cluster_name=$(terraform -chdir="infra/${app_name}/service" output -raw service_cluster_name)
service_name=$(terraform -chdir="infra/${app_name}/service" output -raw service_name)
echo "Wait for service ${service_name} to become stable"
aws ecs wait services-stable --cluster "${cluster_name}" --services "${service_name}"
if ! aws ecs wait services-stable --cluster "${cluster_name}" --services "${service_name}"; then
echo "First attempt to wait for service stability failed, retrying..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: grammar a bit awkward, couple of suggestions to keep it simple:
option A: Repeat "Wait for service app-dev to become stable" (aws ecs wait service-stable already has an error message)
option B: Say "Retrying"

bin/deploy-release Outdated Show resolved Hide resolved
bin/deploy-release Outdated Show resolved Hide resolved
@KevinJBoyer KevinJBoyer merged commit b7a4677 into main Oct 1, 2024
9 checks passed
@KevinJBoyer KevinJBoyer deleted the kb/retry branch October 1, 2024 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants