AWS Batch compute environment needs recreating after launch template change #15535

microbioticajon · 2020-10-07T09:57:28Z

Hi Guys,

Im having trouble applying changes to my AWS Batch configuration. As part of my batch cluster I use a custom Launch Template for the instances in the compute environment. However when I make a change to the Launch Template the Batch compute environment remains un-modified.

Terraform version

v0.13.3

provider registry.terraform.io/-/aws v3.8.0
provider registry.terraform.io/hashicorp/aws v3.8.0
provider registry.terraform.io/hashicorp/null v2.1.2

Affected Resource(s)

aws_batch_compute_environment
aws_launch_template

Expected Behaviour

According to the AWS Batch docs, if the Launch Template is updated with a new version, the entire compute environment needs to be destroyed and rebuilt:

https://docs.aws.amazon.com/batch/latest/userguide/launch-templates.html

Launch Template Support - AWS Batch
AWS Batch does not support updating a compute environment with a new launch template version. If you update your launch template, you must create a new compute environment with the new template for the changes to take effect.

Actual behaviour

aws_compute_environment remains unchanged

As a result, the only way to apply changes to the Launch Template is to manually destroy the compute environment before applying the plan or taint the resources through the command line.

I performed a quick search and I cannot find a way to trigger a forced re-create on a resource within the plan itself.

Any fixes, help or work-arounds would be greatly appreciated.

Note:

My current launch template has resulted in an invalid compute environment which cannot be deleted even when tainted which is why I need to update the launch template. See: #8549

ewbankkit · 2020-10-13T15:55:07Z

@microbioticajon Thanks for raising this issue.
Could you please include a snippet of your Terraform configuration that includes the setting of laucnh_template.version?

microbioticajon · 2020-10-14T11:15:29Z

Hi @ewbankkit,

That was it! The compute environment was relying on the default launch template but terraform was unable to detect the change unless launch_template.version was set. It looks like this is not directly a problem after all, apologies.

While obvious now I think about it, a hint in the docs might helps others who get stuck with the same issue.

I have reapplied the plan with a modified launch template but unfortunately Im now getting the following related error:

# module.test_cluster.data.aws_ebs_snapshot.static_refs will be read during apply
# module.test_cluster.aws_batch_compute_environment.main must be replaced
...
            launch_template {
                launch_template_id = "lt-0929..."
                version            = "1" -> (known after apply) # forces replacement
            }
...
# module.test_cluster.aws_batch_job_queue.general_purpose_queue will be updated in-place
# module.test_cluster.aws_launch_template.worker will be updated in-place
# module.test_cluster.aws_launch_template.worker_ebs_working_vol will be updated in-place

Plan: 1 to add, 3 to change, 1 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.test_cluster.data.aws_ebs_snapshot.static_refs: Refreshing state...
module.test_cluster.aws_batch_compute_environment.main: Destroying... [id=dev-jon-tf-cluster]
module.test_cluster.aws_launch_template.worker_ebs_working_vol: Modifying... [id=lt-0d61...]
module.test_cluster.aws_launch_template.worker_ebs_working_vol: Modifications complete after 0s [id=lt-0d61...]

Error: error deleting Batch Compute Environment (dev-jon-tf-cluster): : Cannot delete, found existing JobQueue relationship
	status code: 400, request id: 77a9x962-0dc8-4edc-88d3-effec3071d0d

Im not sure how to get around this - it looks like TF now recognises that the compute environment needs to be replaced but AWS wont let it while there are still queues associated with it.

Many thanks,
Jon

vspinu · 2021-02-19T23:52:05Z

I am seeing the same error even after manually destroying batch environments in the console. Any ideas of how to reset the (remote) state without re-initializing the project from scratch?

Error: error disabling Batch Compute Environment (dev-batch-cpu4-20210217103010): : arn:aws:batch:eu-central-1:1111111111111:compute-environment/xyz does not exist
	status code: 400, request id: aa1e66ad-e358-43c0-8ebb-8a3cfefc92e7

EDIT: fixed it with an explicit terraform state rm x y z

bhayden53 · 2021-03-19T13:36:40Z

It looks like using launch_template.version = $Latest does not force compute environment re-creation even when terraform knows the launch template is being updated. Shouldn't it?

Otherwise I have to lookup the current launch template value and increment it manually every time I deploy, just to get a new compute environment made correctly?

I mean, the way I understand it, anytime terraform makes any change to a launch template, it should just remake any associated compute environments. Even if you use $Default or $Latest Batch only takes a snapshot of them at the time of compute environment creation; it won't dynamically recognize changes to $Latest or $Default over time.

https://docs.aws.amazon.com/batch/latest/userguide/create-compute-environment.html

After the compute environment is created, the launch template version used will not be changed, even if the $Default or $Latest version for the launch template is updated. To use a new launch template version, create a new compute environment, add the new compute environment to the existing job queue, remove the old compute environment from the job queue, and delete the old compute environment.

bhayden53 · 2021-03-19T14:04:53Z

I think the only reliable solution to this in my situation, is for my deployment to mark the compute environment as tainted every time in order to force re-creation.

AaronNHart · 2021-05-05T11:25:40Z

@bhayden53 I have been struggling with this for about a (painful) year but just noticed a small improvement from using $Latest. You can instead use aws_launch_template.this.latest_version which simply replaces $Latest with the latest version number. This allows terraform to recognize that the CE needs to be replaced. I honestly don't understand what AWS thinks $Latest (and $Default) actually do in compute environments currently. It seems completely broken to me.

However, the issue that @vspinu raises I see often and do not understand the root cause. It seems to me like a bug in the provider, specifically that it doesn't know that the queue must be deleted before the compute environment can be replaced. I suspect this is just a limitation of the AWS API and must be adapted to in the provider.

@ewbankkit if a small reproducible example would help I can provide one. It would be fantastic if we can find a solution.

bhayden53 · 2021-05-05T13:04:08Z

I honestly don't understand what AWS thinks $Latest (and $Default) actually do in compute environments currently. It seems completely broken to me.

AWS Support has told me that it intentionally takes a snapshot of the $Latest or $Default version at the time of CE creation. Definitely not what any reasonable user would expect it to do. I think I also got the "there is an issue in our internal tracker and I have added your voice to it" response as well.

Thanks for the other workaround.

frosforever · 2023-06-15T19:44:15Z

This looks like it might be related to #30438.

Since this issue was first opened, Batch behavior has changed and now allows updates to compute environment launch templates if "the service role is set to AWSServiceRoleForBatch (the default) and that the allocation strategy is BEST_FIT_PROGRESSIVE or SPOT_CAPACITY_OPTIMIZED. BEST_FIT isn't supported."

See https://docs.aws.amazon.com/batch/latest/userguide/updating-compute-environments.html

github-actions · 2023-08-25T02:01:39Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Oct 7, 2020

microbioticajon mentioned this issue Oct 7, 2020

Error: Provider produced inconsistent final plan #15512

Closed

ghost added service/batch Issues and PRs that pertain to the batch service. service/ec2 Issues and PRs that pertain to the ec2 service. labels Oct 7, 2020

ewbankkit removed service/ec2 Issues and PRs that pertain to the ec2 service. needs-triage Waiting for first response or review from a maintainer. labels Oct 13, 2020

ewbankkit mentioned this issue Jul 25, 2023

Make batch compute environments updatable. Fixes GH-24374 #30438

Merged

ewbankkit closed this as completed in #30438 Jul 25, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Batch compute environment needs recreating after launch template change #15535

AWS Batch compute environment needs recreating after launch template change #15535

microbioticajon commented Oct 7, 2020 •

edited

Loading

ewbankkit commented Oct 13, 2020

microbioticajon commented Oct 14, 2020

vspinu commented Feb 19, 2021 •

edited

Loading

bhayden53 commented Mar 19, 2021 •

edited

Loading

bhayden53 commented Mar 19, 2021

AaronNHart commented May 5, 2021

bhayden53 commented May 5, 2021

frosforever commented Jun 15, 2023

github-actions bot commented Aug 25, 2023

AWS Batch compute environment needs recreating after launch template change #15535

AWS Batch compute environment needs recreating after launch template change #15535

Comments

microbioticajon commented Oct 7, 2020 • edited Loading

Terraform version

Affected Resource(s)

Expected Behaviour

Actual behaviour

Note:

ewbankkit commented Oct 13, 2020

microbioticajon commented Oct 14, 2020

vspinu commented Feb 19, 2021 • edited Loading

bhayden53 commented Mar 19, 2021 • edited Loading

bhayden53 commented Mar 19, 2021

AaronNHart commented May 5, 2021

bhayden53 commented May 5, 2021

frosforever commented Jun 15, 2023

github-actions bot commented Aug 25, 2023

microbioticajon commented Oct 7, 2020 •

edited

Loading

vspinu commented Feb 19, 2021 •

edited

Loading

bhayden53 commented Mar 19, 2021 •

edited

Loading