You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Service Binary/Process Name: bottlerocket-ecs-updater
Git Repository: bottlerocket-ecs-updater
Background
We want to provide a solution for automating Bottlerocket updates in ECS clusters.
This functionality will be similar to that provided by brupop.
A system will cause Bottlerocket nodes to apply OS updates as they become available through the waves system.
Throughout, the term node actually means ECS Container Instance.
Requirements
User does not have to manually initiate updates per-host.
Updates obey wave structure as normal.
Hosts are drained of ECS tasks (that are created by a service) before updating.
System should not interrupt tasks that are not part of a service.
Safe update velocity; one host at a time (for initial release).
Check health before moving on (perhaps with ECS healthcheck)
Service
The program will run externally to the nodes-under-management as a Fargate task.
Running the updater in the existing cluster capacity is possible (and could be a future feature), but it would be more complex since the program might want to update its own node.
What it Does
The service periodically communicates with the ECS control plane and Bottlerocket nodes via their respective APIs. When a Bottlerocket node indicates that it has an update available, the system will cause the node to be drained of services, apply the update, reboot the node, and undrain the node.
State Storage
We may need to store some state information beyond the lifetime of the program, but we have not yet figured out where to store it.
One idea is to use the ECS API putAttributes, but we need to research this further to make sure it is an appropriate use of the API.
Design
At regular intervals, a scheduled task will launch the program in a Fargate task.
This task will need an IAM role that allows it to interact with ECS to describe the cluster, drain nodes, undrain nodes and perform healthchecks.
The task might need an EC2 permission to determine whether an instance is running Bottlerocket or not (TBD).
The task will need SSM permissions to communicate with the Bottlerocket nodes it wants to manage (and the nodes will need SSM enabled in order to be managed).
SSM documents, the Fargate task, cron, etc. will be defined in a CloudFormation file.
Program Flow
Describe the cluster.
Build a list of instances ignoring those not running Bottlerocket (hopefully EC2 call not required?)
Query each instance to see if it has an update available (embarrassingly parallel to save time)
Ignore/discard those that don’t need an update.
List tasks and inspect tasks on each node that needs an update, discard nodes that have non-service tasks.
Prioritize the list (probably not too important), e.g.
Current version descending
Seed ascending
Whatever
Proceed through the list of nodes to update, one-at-a-time.
Drain the node
List-tasks for the node until all are stopped.
(TODO - timeout?, Then what?)
Apply update, reboot.
Wait for the node to re-appear (maybe check EC2). (TODO - timeout?, then what?)
Wait for the node to become healthy in ECS.
Undrain the node.
Proceed to the next node.
Testing
Unit
We should use some mediator traits as injected dependencies dynamic dependencies to decouple the business logic from rusoto.
Integ
We should create a binary that we can run that will do an integration test.
Approximate requirements:
Use a pre-existing ECS cluster.
Detect pre-existing instances in the cluster and abort with error informing the user that the cluster should be empty.
Create multiple nodes taking a Bottlerocket AMI (version < latest), or getting it from SSM.
Create a workload service and run it on the nodes.
Assert health of the workload throughout test?
Containerize and deploy the local changeset version of bottlerocket-update-operator to an ECR repo.
Run the updater in a Fargate task.
Assert the nodes do update.
Cleanup.
Issues
This issue serves as a rollup of the issues that we need to close to get to the MVP. Waypoints along the way:
Bottlerocket ECS Updater MVP
Service Binary/Process Name:
bottlerocket-ecs-updater
Git Repository:
bottlerocket-ecs-updater
Background
We want to provide a solution for automating Bottlerocket updates in ECS clusters.
This functionality will be similar to that provided by brupop.
A system will cause Bottlerocket nodes to apply OS updates as they become available through the waves system.
Throughout, the term node actually means ECS Container Instance.
Requirements
Service
The program will run externally to the nodes-under-management as a Fargate task.
Running the updater in the existing cluster capacity is possible (and could be a future feature), but it would be more complex since the program might want to update its own node.
What it Does
The service periodically communicates with the ECS control plane and Bottlerocket nodes via their respective APIs. When a Bottlerocket node indicates that it has an update available, the system will cause the node to be drained of services, apply the update, reboot the node, and undrain the node.
State Storage
We may need to store some state information beyond the lifetime of the program, but we have not yet figured out where to store it.
One idea is to use the ECS API putAttributes, but we need to research this further to make sure it is an appropriate use of the API.
Design
At regular intervals, a scheduled task will launch the program in a Fargate task.
This task will need an IAM role that allows it to interact with ECS to describe the cluster, drain nodes, undrain nodes and perform healthchecks.
The task might need an EC2 permission to determine whether an instance is running Bottlerocket or not (TBD).
The task will need SSM permissions to communicate with the Bottlerocket nodes it wants to manage (and the nodes will need SSM enabled in order to be managed).
SSM documents, the Fargate task, cron, etc. will be defined in a CloudFormation file.
Program Flow
Testing
Unit
We should use some mediator traits as injected dependencies dynamic dependencies to decouple the business logic from
rusoto
.Integ
We should create a binary that we can run that will do an integration test.
Approximate requirements:
bottlerocket-update-operator
to an ECR repo.Issues
This issue serves as a rollup of the issues that we need to close to get to the MVP. Waypoints along the way:
The text was updated successfully, but these errors were encountered: