Skip to content
This repository has been archived by the owner on Apr 22, 2020. It is now read-only.

Corner case with initialization in an ASG without ELB health check #441

Open
a1exsh opened this issue Aug 25, 2017 · 0 comments
Open

Corner case with initialization in an ASG without ELB health check #441

a1exsh opened this issue Aug 25, 2017 · 0 comments

Comments

@a1exsh
Copy link
Contributor

a1exsh commented Aug 25, 2017

In certain scenarios when AutoScaling Group is trying to balance instances across Availability Zones, it would launch new instance in an AZ where another instance is already running, before terminating the old instance.

This becomes problematic if the application is stateful (https://github.com/zalando/spilo) and relies on Taupage functionality of attaching persistent EBS volumes specified with Name tag in the user data, because the newly launched EC2 instance cannot find an available volume, since it is still attached to the old running instance.

What happens in practice is that Taupage startup fails with the message like the following:

Aug 25 09:56:08 ip-172-31-7-104 taupage-init: ERROR: No matching EBS volume with name my_ebs_volume_name found.
Aug 25 09:56:08 ip-172-31-7-104 taupage-init: ERROR: Failed to start ./init.d/10-prepare-disks.py

After hitting this problem Taupage just sits there and normally the ASG would terminate instance because it is not healthy, so this would succeed eventually. Unfortunately, in our case there is no Load Balancer which would tell the ASG that the instance is unhealthy (the ASG's health check type is "EC2"), so it stays like that forever and needs manual interaction to continue normally.

It would make sense if Taupage reboots the machine if it fails the initialization. That would prevent the described problem and potentially other similar issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant