Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent updates possibly broke CLI execute-command #435

Closed
ssyberg opened this issue Apr 1, 2022 · 36 comments
Closed

Recent updates possibly broke CLI execute-command #435

ssyberg opened this issue Apr 1, 2022 · 36 comments

Comments

@ssyberg
Copy link

ssyberg commented Apr 1, 2022

There are a number of github issues floating around on related repos that might be tied to recent ssm agent updates, though this is incredibly difficult to verify from our end, if someone could do a little investigating that would be great.

The general issue that manifests is an inability to run the execute-command via cli and a TargetNotConnectedException thrown. Existing troubleshooting guides have thus far not yielded success.

Related tickets:

aws/aws-cli#6834
aws/aws-cli#6562
aws-containers/amazon-ecs-exec-checker#47

@GeorgeNagel
Copy link

Example output from aws ecs execute-command ...:

The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.


An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.

@ssyberg
Copy link
Author

ssyberg commented Apr 1, 2022

Exact output for everyone with this problem as far as I can tell ☝🏼

@tim-finnigan
Copy link

tim-finnigan commented Apr 1, 2022

This looks related: aws-containers/amazon-ecs-exec-checker#49

Do you also have AWS_ACCESS_KEY / AWS_SECRET_ACCESS_KEY set? That may be causing the issue.

@ssyberg
Copy link
Author

ssyberg commented Apr 1, 2022

Do you also have AWS_SECRET_ACCESS_KEY set? That may be causing the issue.

If my parsing of the terraform config can be trusted, we are not setting that in environment_variables but it is available in the secrets

I'll try removing this now and see if that makes a difference.

@ssyberg
Copy link
Author

ssyberg commented Apr 1, 2022

Do you also have AWS_SECRET_ACCESS_KEY set? That may be causing the issue.

If my parsing of the terraform config can be trusted, we are not setting that in environment_variables but it is available in the secrets

I'll try removing this now and see if that makes a difference.

Holy moly that worked! That said, we actually actively use those credentials in our task, so we'll need a workaround for exposing them. Still seems like setting these env vars shouldn't have this effect right?

@tim-finnigan
Copy link

Glad that worked! I'm waiting on more info regarding this and will post an update here.

@farkmarnum
Copy link

farkmarnum commented Apr 1, 2022

Renaming AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY has also fixed the problem for me! Bizarre that this just started happening at ~5pm EST March 30 out of nowhere.

@nathando
Copy link

nathando commented Apr 3, 2022

Can we revert to previous version of aws cli to fix this ? Because changing the environments will break other things in our tasks

@raptorcZn
Copy link

Facing this issue as well. As @nathando mentioned, would be great if it reverted to the previous behaviour so that we don't have to change the environment variables.

@nicolasbuch
Copy link

Encountered this error: "An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later." out of nowhere 4 days ago.

In my case i also had AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY set as ENV variables (in my task description) since my application needs to interact with the AWS API. It was working fine until now, so something must have changes in recent updates.

There is no need to change the environment variables through, all you need to do is to give the user (AWS_ACCESS_KEY_ID) permissions to allow the ECS exec command

{
   "Version": "2012-10-17",
   "Statement": [
       {
       "Effect": "Allow",
       "Action": [
            "ssmmessages:CreateControlChannel",
            "ssmmessages:CreateDataChannel",
            "ssmmessages:OpenControlChannel",
            "ssmmessages:OpenDataChannel"
       ],
      "Resource": "*"
      }
   ]
}

@tim-finnigan
Copy link

Thanks @nicolasbuch, those requirements are also documented here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html#ecs-exec-prerequisites as well as this troubleshooting article for the TargetNotConnectedException error: https://aws.amazon.com/premiumsupport/knowledge-center/ecs-error-execute-command/

Those requirements aren’t new so I’m not sure why recent updates would be a factor here. Has anyone tried rolling back to a previous SSM Agent version to see if they still see this issue? It would help the team to have agent logs from a container that is experiencing the issue. You could provide those here or contact AWS Support.

@Thor-Bjorgvinsson
Copy link

Thor-Bjorgvinsson commented Apr 4, 2022

The agent version in ECS Exec is controlled by ECS during AMI build and they say they haven't changed the version recently. Can anyone here that encountered the issue and has removed AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from their environment start a session and get the agent version?

# Assuming your session starts in the ECS Exec bin folder
./amazon-ssm-agent -version

Also, are you seeing this issue on ECS on EC2 or Fargate?

@pauricthelodger
Copy link

pauricthelodger commented Apr 4, 2022

@Thor-Bjorgvinsson after making the change and removing the envvars I can access the containers and see the following versions according to the log output on Fargate tasks

amazon-ssm-agent - v3.1.715.0
ssm-agent-worker - v3.1.715.0

@GeorgeNagel
Copy link

@Thor-Bjorgvinsson Seeing the issue on Fargate.

@yufio
Copy link

yufio commented Apr 4, 2022

We also experience the same issue since last Friday (01 April 2022). We didn't change anything and the command execution stopped working. We also have AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the env. Funny enough on one of our environments it still works but on 2 other stopped.
We are now investigating the permissions differences.
The user on that env has admin access rights (dev env)

@Thor-Bjorgvinsson
Copy link

We've confirmed that this is a SSM Agent issue in a recent Fargate deployment where the agent version was updated. Any new tasks started in Fargate will use a SSM Agent build with this issue. We are working with the Fargate team to deploy a fix for this. Mitigation as mentioned above, remove AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from task definition environment variables

@akhiljalagam
Copy link

Encountered this error: "An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later." out of nowhere 4 days ago.

In my case i also had AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY set as ENV variables (in my task description) since my application needs to interact with the AWS API. It was working fine until now, so something must have changes in recent updates.

There is no need to change the environment variables through, all you need to do is to give the user (AWS_ACCESS_KEY_ID) permissions to allow the ECS exec command

{
   "Version": "2012-10-17",
   "Statement": [
       {
       "Effect": "Allow",
       "Action": [
            "ssmmessages:CreateControlChannel",
            "ssmmessages:CreateDataChannel",
            "ssmmessages:OpenControlChannel",
            "ssmmessages:OpenDataChannel"
       ],
      "Resource": "*"
      }
   ]
}

this worked for me.

@Thor-Bjorgvinsson
Copy link

Thor-Bjorgvinsson commented Apr 6, 2022

@akhiljalagam I can confirm this can be used for mitigation today but not recommended, this will not be possible in the close future sometime after fix has been released. Agent will only be able to connect using ECS Task metadata service credentials.

The recommended mitigation is to unset the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables

@Sohett
Copy link

Sohett commented Apr 11, 2022

@Thor-Bjorgvinsson, how can we follow the status of this? I don't want to be too pushy, but we're really blocked 😬. Is there some kind of prioritisation as this is a regression?

Anyway, thanks for the work 💪 .

@Thor-Bjorgvinsson
Copy link

We've pushed out a fix in agent release 3.1.1260.0 for this issue. We're currently working with related AWS services to integrate this fix; we'll add further updates as those integrations are completed.

@jmagoon
Copy link

jmagoon commented Apr 19, 2022

For other people who come across this issue, this error happens for us when we have AWS_SHARED_CREDENTIALS_FILE set as an environment variable as well. When it is removed, ecs execute-command works correctly.

@ZacBridge
Copy link

Hopefully this doesn't put a spanner in the works - but i've been having this issue across all of my services. Only 1 of the services actually had AWS env vars in them, after renaming those that service was fine.

The others however, still respond with the same "Internal server error", with no AWS env vars to note on the tasks.

@GeorgeNagel
Copy link

I'm seeing this again since the 3.1.1260.0 release. Is it possible other env variable names are now disallowed? In particular, I changed my AWS_SECRET_ACCESS_KEY env variable to AWS_SECRET_ACCESS_KEY_ECS, which was working until the 3.1.1260.0 release. Now changing that key to AWS_SECRET_ACCESS_KEY_<something>_ECS, I am able to connect again.

I'm wondering if the fix in 3.1.1260.0 was to switch from using AWS_SECRET_ACCESS_KEY to AWS_SECRET_ACCESS_KEY_ECS in some internal API. If so, perhaps more of a root cause fix is needed. Or documentation which specifies which env variable names cause these conflicts.

@djGrill
Copy link

djGrill commented Apr 25, 2022

maybe it's partially matching AWS_SECRET_ACCESS_KEY* instead of just AWS_SECRET_ACCESS_KEY? 🤔

@justinko
Copy link

No error for me with AWS_SECRET_ACCESS_KEY_2

@bigbluechicken
Copy link

Is there any update on when the fix will be rolled out?

@serhiibeznisko
Copy link

Renaming AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID variables did the job!

@Thor-Bjorgvinsson
Copy link

ECS released a new AMI with the updated SSM Agent (ecs optimized ami version 20220421), still pending Fargate release.

SSM Agent commit to resolve this issue

@Sohett
Copy link

Sohett commented May 5, 2022

Any news concerning the fargate release ?

@benoawfu
Copy link

benoawfu commented May 5, 2022

Without changing anything regarding the env variables I redeployed my ECS fargate instances and with the latest awscli this works fine now.

@Thor-Bjorgvinsson
Copy link

Fargate has completed release of the new agent

@andarocks
Copy link

andarocks commented Apr 11, 2023

Hi Guys,

I have used the ECS checker and this is the below result:

-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
  jq      | OK (/opt/homebrew/bin/jq)
  AWS CLI | OK (/opt/homebrew/bin/aws)

-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
  AWS CLI Version        | OK (aws-cli/2.11.9 Python/3.11.2 Darwin/22.4.0 source/arm64 prompt/off)
  Session Manager Plugin | OK (1.2.463.0)

-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : eu-west-1
Cluster: app-service-cluster-test
Task   : b460c8c1bb334429a39ff7a4b1bad180
-------------------------------------------------------------
  Cluster Configuration  |
     KMS Key       : Not Configured
     Audit Logging : DEFAULT
     S3 Bucket Name: Not Configured
     CW Log Group  : Not Configured
  Can I ExecuteCommand?  | arn:aws:iam::117038214493:user/cli-admin
     ecs:ExecuteCommand: allowed
     ssm:StartSession denied?: allowed
  Task Status            | RUNNING
  Launch Type            | Fargate
  Platform Version       | 1.4.0
  Exec Enabled for Task  | OK
  Container-Level Checks |
    ----------
      Managed Agent Status
    ----------
         1. RUNNING for "app-service-test-container"
    ----------
      Init Process Enabled (app-service-task-definition-test:18)
    ----------
         1. Disabled - "app-service-test-container"
    ----------
      Read-Only Root Filesystem (app-service-task-definition-test:18)
    ----------
         1. Disabled - "app-service-test-container"
  Task Role Permissions  | arn:aws:iam::117038214493:role/TuskProdECSTaskRole
     ssmmessages:CreateControlChannel: allowed
     ssmmessages:CreateDataChannel: allowed
     ssmmessages:OpenControlChannel: allowed
     ssmmessages:OpenDataChannel: allowed
  VPC Endpoints          |
    Found existing endpoints for vpc-00bfcd992d7f50681:
      - com.amazonaws.eu-west-1.ssmmessages
      - com.amazonaws.eu-west-1.s3
      - com.amazonaws.vpce.eu-west-1.vpce-svc-0e7975f61ffb9d0f7
  Environment Variables  | (app-service-task-definition-test:18)
       1. container "app-service-test-container"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined

All the configuration seems to be okay... the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY is not defined. But I am still getting TargetNotConnectedException. Am I missing something?

AWS CLI version: 2.11.9

@andarocks
Copy link

But the AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY variables are defined in the .env file inside the container. I hope that's not the issue.

@istvanfedak-nbcu
Copy link

I'm experiencing this exact same issue. The aws ecs execute-command was working for me last week and it stopped working.

@obaqueiro
Copy link

obaqueiro commented Dec 27, 2023

Did anyone else stumbled to this problem again?

We started getting this issue again. There are no AWS_ACCESS_KEY / SECRET defined and check-ecs-exec.sh shows everything OK (green and yellow)

-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
  jq      | OK (/usr/bin/jq)
  AWS CLI | OK (/usr/local/bin/aws)

-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
t.21 prompt/off)
  Session Manager Plugin | OK (1.2.497.0)

-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : us-east-1
Cluster: cluster-name
Task   : 949fd5e48ebf4ba4b895176cb0c36d50
  Cluster Configuration  |
     KMS Key       : Not Configured
     Audit Logging : DEFAULT
     S3 Bucket Name: Not Configured
     CW Log Group  : Not Configured
  Can I ExecuteCommand?  | arn:aws:iam::xxx:user/deployment
     ecs:ExecuteCommand: allowed
     ssm:StartSession denied?: allowed
  Task Status            | RUNNING
  Platform Version       | 1.4.0
  Exec Enabled for Task  | OK
  Container-Level Checks | 
    ----------
      Managed Agent Status
    ----------
         1. RUNNING for "metabase_app_dev"
    ----------
      Init Process Enabled (metabase_dev:3)
    ----------
         1. Disabled - "metabase_app_dev"
    ----------
      Read-Only Root Filesystem (metabase_dev:3)
    ----------
         1. Disabled - "metabase_app_dev"
  Task Role Permissions  | arn:aws:iam::xxx:role/metabase_ecsTaskExecutionRole_dev
     ssmmessages:CreateControlChannel: allowed
     ssmmessages:CreateDataChannel: allowed
     ssmmessages:OpenControlChannel: allowed
     ssmmessages:OpenDataChannel: allowed
  VPC Endpoints          | 
    Found existing endpoints for vpc-081adc23fcb697c58:
      - com.amazonaws.us-east-1.execute-api
      - com.amazonaws.us-east-1.secretsmanager
      - com.amazonaws.vpce.us-east-1.vpce-svc-0256367e65088edb5
      - com.amazonaws.us-east-1.ssmmessages
  Environment Variables  | (metabase_dev:3)
       1. container "metabase_app_dev"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY:: not defined

 
$ aws ecs execute-command  --cluster cluster-name --task 949fd5e48ebf4ba4b895176cb0c36d50 --container  metabase_app_dev --command 'sh' --interactive

The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.


An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.

@stickperson
Copy link

I'm experiencing issues as well. I'm using Fargate and can start two tasks in the same subnet, one will work and the other will not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests