You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using Terragrunt with S3 Remote State in a Docker container, Terragrunt needs to authenticate to AWS S3 directly (not via underlying terraform). When you are on an EC2 instance that has an IAM Role attached (not access keys), Terragrunt uses the EC2 Metadata API via the underlying AWS Go SDK. This results in very poor performance during the remote state initialization process. On AWS GovCloud us-gov-west-1, the remote state initialization takes >10 seconds in a Docker container, whereas it takes <1 second natively.
FROM alpine:3.20.1 AS builder
# Install curl to download kubectl
RUN apk add --no-cache curl aws-cli
# Define the kubectl version to download
ARG TOFU_VERSION=1.8.3
ARG TERRAGRUNT_VERSION=0.67.16
# Download Tofu
RUN curl -LO https://github.com/opentofu/opentofu/releases/download/v${TOFU_VERSION}/tofu_${TOFU_VERSION}_amd64.apk && \
mv tofu_${TOFU_VERSION}_amd64.apk /usr/local/bin/tofu.apk && \
apk add --allow-untrusted /usr/local/bin/tofu.apk
# Download Terragrunt
RUN curl -LO https://github.com/gruntwork-io/terragrunt/releases/download/v${TERRAGRUNT_VERSION}/terragrunt_linux_amd64 && \
mv terragrunt_linux_amd64 /usr/local/bin/terragrunt
# Make tofu executable
RUN chmod +x /usr/bin/tofu ; chmod +x /usr/local/bin/terragrunt
# environment variables
ENV TERRAGRUNT_TFPATH="tofu"
ENV TERRAGRUNT_NON_INTERACTIVE="false"
ENV TERRAGRUNT_PROVIDER_CACHE=0
ENV TERRAGRUNT_PARALLELISM=1
# Set default entrypioint to bash
ENTRYPOINT ["/bin/bash"]
Create an EC2 instance with an IAM role attached with necessary permissions.
Exec into the docker image on the EC2 with docker run -it ... bash
Inside the docker container, run terragrunt init (or terragrunt plan,terragrunt apply, etc any command that uses remote state).
Notice that it takes significant time before the underlying terraform init runs.
This "significant time" is at least 10x as long as it would be outside the docker container. In fact, in a certain environment I operate in, it is 4-6 minutes which is unbearably long for each terragrunt operation. I can provide more details about this environment privately.
Expected behavior
The command takes up to a few seconds before actually running the underlying terraform command.
Logs
Here is an example of debug logs (sanitized for privacy).
$ terragrunt init --terragrunt-log-level debug --terragrunt-debug
21:22:45.713 DEBUG Terragrunt Version: 0.67.1
21:22:45.725 DEBUG Did not find any locals block: skipping evaluation.
21:22:45.731 DEBUG Found locals block: evaluating the expressions.
21:22:45.741 DEBUG Evaluated 2 locals (remaining 0): env, terraform_cache_dir
... env logs ...
21:22:49.344 DEBUG Running command: tofu --version
21:22:49.420 DEBUG tofu version: 1.8.1
21:22:49.420 DEBUG Reading Terragrunt config file at terragrunt.hcl
21:22:49.421 DEBUG Did not find any locals block: skipping evaluation.
21:22:49.424 DEBUG Found locals block: evaluating the expressions.
21:22:49.431 DEBUG Evaluated 2 locals (remaining 0): env, terraform_cache_dir
... env logs ...
21:22:49.464 DEBUG Getting output of dependency .. for config terragrunt.hcl
... dependency logs ...
21:23:06.924 DEBUG Found locals block: evaluating the expressions.
21:23:06.931 DEBUG Evaluated 2 locals (remaining 0): env, terraform_cache_dir
21:23:06.936 DEBUG Found locals block: evaluating the expressions.
21:23:06.937 DEBUG Evaluated 2 locals (remaining 0): env, terraform_cache_dir
21:23:06.940 DEBUG Included config ../../../terragrunt.hcl has strategy shallow merge: merging config in (shallow).
21:23:06.947 DEBUG Found locals block: evaluating the expressions.
21:23:06.949 DEBUG Evaluated 1 locals (remaining 0): env
21:23:06.953 DEBUG Found locals block: evaluating the expressions.
21:23:06.961 DEBUG Evaluated 1 locals (remaining 0): env
21:23:06.970 DEBUG Included config ../../../_env/emr.hcl has strategy shallow merge: merging config in (shallow).
21:23:06.970 DEBUG Detected 1 Hooks
21:23:06.970 INFO Downloading Terraform configurations from ...
21:23:07.022 DEBUG Detected 1 Hooks
21:23:07.024 DEBUG Copying files from...
21:23:07.027 DEBUG Setting working directory to ...
21:23:07.028 DEBUG Generated file .terragrunt-cache/w_zPDJwXr8fxnrUd-w10tIHl8HM/Xz4P-Jhavj4obcO3eEDRzJIDlyI/providers.tf.
21:23:07.028 DEBUG Generated file .terragrunt-cache/w_zPDJwXr8fxnrUd-w10tIHl8HM/Xz4P-Jhavj4obcO3eEDRzJIDlyI/backend.tf.
21:23:07.028 INFO Debug mode requested: generating debug file terragrunt-debug.tfvars.json in working dir ...
21:23:07.071 DEBUG The following variables were detected in the terraform module:
21:23:07.071 DEBUG [...]
21:23:07.071 DEBUG WARN: The variable ssl_certificate was omitted because it is not defined in the terraform module.
21:23:07.071 DEBUG WARN: The variable immtua_endpoint was omitted because it is not defined in the terraform module.
21:23:07.071 DEBUG WARN: The variable custom_logging_filename was omitted because it is not defined in the terraform module.
21:23:07.071 DEBUG WARN: The variable cert_private_key was omitted because it is not defined in the terraform module.
21:23:07.071 DEBUG Variables passed to terraform are located in "sanitized"
21:23:07.071 DEBUG Run this command to replicate how terraform was invoked:
21:23:07.071 DEBUG terraform -chdir="sanitized" init -var-file="sanitized"
21:23:07.072 DEBUG Initializing remote state for the s3 backend
21:23:13.330 DEBUG Verifying AWS S3 Bucket Versioning <bucket name>
21:23:13.337 DEBUG Checking if SSE is enabled for AWS S3 bucket <bucket name>
21:23:13.358 DEBUG Checking if bucket <bucket name> is have root access
21:23:13.366 DEBUG Policy for RootAccess already exists for bucket <bucket name>
21:23:13.366 DEBUG Checking if bucket <bucket name> is enforced with TLS
21:23:13.374 DEBUG Policy for EnforcedTLS already exists for bucket <bucket name>
21:23:13.374 DEBUG S3 bucket is already up to date
21:23:13.374 DEBUG Verifying AWS S3 Bucket Versioning <bucket name>
21:23:19.665 DEBUG Running command: tofu init
21:23:19.750 STDOUT tofu: Initializing the backend...
21:23:23.378 STDOUT tofu:
21:23:23.378 STDOUT tofu: Successfully configured the backend "s3"! OpenTofu will automatically
21:23:23.378 STDOUT tofu: use this backend unless the backend configuration changes.
21:23:23.467 STDOUT tofu: Initializing provider plugins...
21:23:23.468 STDOUT tofu: - Finding hashicorp/random versions matching "3.5.1"...
21:23:23.470 STDOUT tofu: - Finding hashicorp/null versions matching "3.2.1"...
... other providers ...
21:23:31.429 STDOUT tofu:
21:23:31.429 STDOUT tofu: OpenTofu has been successfully initialized!
21:23:31.429 STDOUT tofu:
21:23:31.429 STDOUT tofu: You may now begin working with OpenTofu. Try running "tofu plan" to see
21:23:31.429 STDOUT tofu: any changes that are required for your infrastructure. All OpenTofu commands
21:23:31.429 STDOUT tofu: should now work.
21:23:31.429 STDOUT tofu: If you ever set or change modules or backend configuration for OpenTofu,
21:23:31.429 STDOUT tofu: rerun this command to reinitialize your working directory. If you forget, other
21:23:31.429 STDOUT tofu: commands will detect it and remind you to do so if necessary.
Notice the time difference between the "Initializing remote state for the s3 backend" and the next lines (6 seconds). That does not seem that bad but it's so much worse than outside of the docker container.
Versions
Terragrunt version: 0.67.16
OpenTofu version: 1.8.3
Environment details: AWS EC2 instance with IAM role attached, inside Docker container
Workaround
I found a workaround - run the Docker container with Host networking (docker run --network host --it ... bash).
Additional context
I believe this is related to the AWS SDK calling the Instance Metadata service. When I run netstat, I see tons of calls to the .internal DNS name for the Instance Metadata service (169.254.169.254). My theory is that something is funny with the networking and it leads to slowness but not timeouts/errors.
Admitedly, this might be a problem with the underlying AWS Golang SDK, but I think that is unlikely.
The text was updated successfully, but these errors were encountered:
I believe that reaching out to instance metadata is one of the first steps in all AWS SDK implementations.
I think a more direct fix for your issue is to take advantage of the disable_bucket_update = true configuration, which will prevent all attempts to update your S3 + DynamoDB backend, avoiding the attempt to authenticate with AWS at all.
Long term, the CLI shouldn't attempt to automatically make any adjustments to backend resources without explicit opt-in. I've shared a proposal to address that here: #3445
Closing this issue, as it's not really something that can be addressed with a change to how Terragrunt works.
@yhakbar Understood. Thanks for walking me through that! I'm comfortable with closing this too, and happy to have this record so if anyone else runs into this, they know the workaround and context.
Have a great day!
Describe the bug
When using Terragrunt with S3 Remote State in a Docker container, Terragrunt needs to authenticate to AWS S3 directly (not via underlying terraform). When you are on an EC2 instance that has an IAM Role attached (not access keys), Terragrunt uses the EC2 Metadata API via the underlying AWS Go SDK. This results in very poor performance during the remote state initialization process. On AWS GovCloud
us-gov-west-1
, the remote state initialization takes >10 seconds in a Docker container, whereas it takes <1 second natively.Steps To Reproduce
docker run -it ... bash
terragrunt init
(orterragrunt plan
,terragrunt apply
, etc any command that uses remote state).terraform init
runs.This "significant time" is at least 10x as long as it would be outside the docker container. In fact, in a certain environment I operate in, it is 4-6 minutes which is unbearably long for each terragrunt operation. I can provide more details about this environment privately.
Expected behavior
The command takes up to a few seconds before actually running the underlying terraform command.
Logs
Here is an example of debug logs (sanitized for privacy).
Notice the time difference between the "Initializing remote state for the s3 backend" and the next lines (6 seconds). That does not seem that bad but it's so much worse than outside of the docker container.
Versions
Workaround
I found a workaround - run the Docker container with Host networking (
docker run --network host --it ... bash
).Additional context
I believe this is related to the AWS SDK calling the Instance Metadata service. When I run
netstat
, I see tons of calls to the.internal
DNS name for the Instance Metadata service (169.254.169.254). My theory is that something is funny with the networking and it leads to slowness but not timeouts/errors.Admitedly, this might be a problem with the underlying AWS Golang SDK, but I think that is unlikely.
The text was updated successfully, but these errors were encountered: