Recent updates possibly broke CLI `execute-command` #435

ssyberg · 2022-04-01T13:17:54Z

There are a number of github issues floating around on related repos that might be tied to recent ssm agent updates, though this is incredibly difficult to verify from our end, if someone could do a little investigating that would be great.

The general issue that manifests is an inability to run the execute-command via cli and a TargetNotConnectedException thrown. Existing troubleshooting guides have thus far not yielded success.

Related tickets:

aws/aws-cli#6834
aws/aws-cli#6562
aws-containers/amazon-ecs-exec-checker#47

The text was updated successfully, but these errors were encountered:

GeorgeNagel · 2022-04-01T14:34:52Z

Example output from aws ecs execute-command ...:

The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.


An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.

ssyberg · 2022-04-01T14:36:17Z

Exact output for everyone with this problem as far as I can tell ☝🏼

tim-finnigan · 2022-04-01T16:29:56Z

This looks related: aws-containers/amazon-ecs-exec-checker#49

Do you also have AWS_ACCESS_KEY / AWS_SECRET_ACCESS_KEY set? That may be causing the issue.

ssyberg · 2022-04-01T16:35:16Z

Do you also have AWS_SECRET_ACCESS_KEY set? That may be causing the issue.

If my parsing of the terraform config can be trusted, we are not setting that in environment_variables but it is available in the secrets

I'll try removing this now and see if that makes a difference.

ssyberg · 2022-04-01T17:01:06Z

Do you also have AWS_SECRET_ACCESS_KEY set? That may be causing the issue.

If my parsing of the terraform config can be trusted, we are not setting that in environment_variables but it is available in the secrets

I'll try removing this now and see if that makes a difference.

Holy moly that worked! That said, we actually actively use those credentials in our task, so we'll need a workaround for exposing them. Still seems like setting these env vars shouldn't have this effect right?

tim-finnigan · 2022-04-01T17:07:43Z

Glad that worked! I'm waiting on more info regarding this and will post an update here.

farkmarnum · 2022-04-01T17:15:21Z

Renaming AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY has also fixed the problem for me! Bizarre that this just started happening at ~5pm EST March 30 out of nowhere.

nathando · 2022-04-03T04:47:42Z

Can we revert to previous version of aws cli to fix this ? Because changing the environments will break other things in our tasks

raptorcZn · 2022-04-04T06:17:54Z

Facing this issue as well. As @nathando mentioned, would be great if it reverted to the previous behaviour so that we don't have to change the environment variables.

nicolasbuch · 2022-04-04T10:50:52Z

Encountered this error: "An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later." out of nowhere 4 days ago.

In my case i also had AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY set as ENV variables (in my task description) since my application needs to interact with the AWS API. It was working fine until now, so something must have changes in recent updates.

There is no need to change the environment variables through, all you need to do is to give the user (AWS_ACCESS_KEY_ID) permissions to allow the ECS exec command

{
   "Version": "2012-10-17",
   "Statement": [
       {
       "Effect": "Allow",
       "Action": [
            "ssmmessages:CreateControlChannel",
            "ssmmessages:CreateDataChannel",
            "ssmmessages:OpenControlChannel",
            "ssmmessages:OpenDataChannel"
       ],
      "Resource": "*"
      }
   ]
}

tim-finnigan · 2022-04-04T14:24:00Z

Thanks @nicolasbuch, those requirements are also documented here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html#ecs-exec-prerequisites as well as this troubleshooting article for the TargetNotConnectedException error: https://aws.amazon.com/premiumsupport/knowledge-center/ecs-error-execute-command/

Those requirements aren’t new so I’m not sure why recent updates would be a factor here. Has anyone tried rolling back to a previous SSM Agent version to see if they still see this issue? It would help the team to have agent logs from a container that is experiencing the issue. You could provide those here or contact AWS Support.

Thor-Bjorgvinsson · 2022-04-04T14:27:27Z

The agent version in ECS Exec is controlled by ECS during AMI build and they say they haven't changed the version recently. Can anyone here that encountered the issue and has removed AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from their environment start a session and get the agent version?

# Assuming your session starts in the ECS Exec bin folder
./amazon-ssm-agent -version

Also, are you seeing this issue on ECS on EC2 or Fargate?

pauricthelodger · 2022-04-04T15:01:12Z

@Thor-Bjorgvinsson after making the change and removing the envvars I can access the containers and see the following versions according to the log output on Fargate tasks

amazon-ssm-agent - v3.1.715.0
ssm-agent-worker - v3.1.715.0

GeorgeNagel · 2022-04-04T15:01:31Z

@Thor-Bjorgvinsson Seeing the issue on Fargate.

yufio · 2022-04-04T16:23:38Z

We also experience the same issue since last Friday (01 April 2022). We didn't change anything and the command execution stopped working. We also have AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the env. Funny enough on one of our environments it still works but on 2 other stopped.
~~We are now investigating the permissions differences.~~
The user on that env has admin access rights (dev env)

Thor-Bjorgvinsson · 2022-04-04T19:26:41Z

We've confirmed that this is a SSM Agent issue in a recent Fargate deployment where the agent version was updated. Any new tasks started in Fargate will use a SSM Agent build with this issue. We are working with the Fargate team to deploy a fix for this. Mitigation as mentioned above, remove AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from task definition environment variables

akhiljalagam · 2022-04-06T16:33:17Z

Encountered this error: "An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later." out of nowhere 4 days ago.

In my case i also had AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY set as ENV variables (in my task description) since my application needs to interact with the AWS API. It was working fine until now, so something must have changes in recent updates.

There is no need to change the environment variables through, all you need to do is to give the user (AWS_ACCESS_KEY_ID) permissions to allow the ECS exec command
{
   "Version": "2012-10-17",
   "Statement": [
       {
       "Effect": "Allow",
       "Action": [
            "ssmmessages:CreateControlChannel",
            "ssmmessages:CreateDataChannel",
            "ssmmessages:OpenControlChannel",
            "ssmmessages:OpenDataChannel"
       ],
      "Resource": "*"
      }
   ]
}

this worked for me.

Thor-Bjorgvinsson · 2022-04-06T17:10:39Z

@akhiljalagam I can confirm this can be used for mitigation today but not recommended, this will not be possible in the close future sometime after fix has been released. Agent will only be able to connect using ECS Task metadata service credentials.

The recommended mitigation is to unset the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables

Sohett · 2022-04-11T11:30:33Z

@Thor-Bjorgvinsson, how can we follow the status of this? I don't want to be too pushy, but we're really blocked 😬. Is there some kind of prioritisation as this is a regression?

Anyway, thanks for the work 💪 .

Thor-Bjorgvinsson · 2022-04-18T13:46:07Z

We've pushed out a fix in agent release 3.1.1260.0 for this issue. We're currently working with related AWS services to integrate this fix; we'll add further updates as those integrations are completed.

jmagoon · 2022-04-19T20:05:23Z

For other people who come across this issue, this error happens for us when we have AWS_SHARED_CREDENTIALS_FILE set as an environment variable as well. When it is removed, ecs execute-command works correctly.

ZacBridge · 2022-04-21T09:22:33Z

Hopefully this doesn't put a spanner in the works - but i've been having this issue across all of my services. Only 1 of the services actually had AWS env vars in them, after renaming those that service was fine.

The others however, still respond with the same "Internal server error", with no AWS env vars to note on the tasks.

GeorgeNagel · 2022-04-24T16:31:52Z

I'm seeing this again since the 3.1.1260.0 release. Is it possible other env variable names are now disallowed? In particular, I changed my AWS_SECRET_ACCESS_KEY env variable to AWS_SECRET_ACCESS_KEY_ECS, which was working until the 3.1.1260.0 release. Now changing that key to AWS_SECRET_ACCESS_KEY_<something>_ECS, I am able to connect again.

I'm wondering if the fix in 3.1.1260.0 was to switch from using AWS_SECRET_ACCESS_KEY to AWS_SECRET_ACCESS_KEY_ECS in some internal API. If so, perhaps more of a root cause fix is needed. Or documentation which specifies which env variable names cause these conflicts.

djGrill · 2022-04-25T02:01:11Z

maybe it's partially matching AWS_SECRET_ACCESS_KEY* instead of just AWS_SECRET_ACCESS_KEY? 🤔

justinko · 2022-04-25T02:02:25Z

No error for me with AWS_SECRET_ACCESS_KEY_2

bigbluechicken · 2022-04-26T17:44:35Z

Is there any update on when the fix will be rolled out?

serhiibeznisko · 2022-04-26T18:54:52Z

Renaming AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID variables did the job!

Thor-Bjorgvinsson · 2022-04-26T20:25:04Z

ECS released a new AMI with the updated SSM Agent (ecs optimized ami version 20220421), still pending Fargate release.

SSM Agent commit to resolve this issue

Sohett · 2022-05-05T07:00:30Z

Any news concerning the fargate release ?

benoawfu · 2022-05-05T07:13:57Z

Without changing anything regarding the env variables I redeployed my ECS fargate instances and with the latest awscli this works fine now.

Thor-Bjorgvinsson · 2022-05-05T18:51:29Z

Fargate has completed release of the new agent

andarocks · 2023-04-11T18:13:21Z

Hi Guys,

I have used the ECS checker and this is the below result:

-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
  jq      | OK (/opt/homebrew/bin/jq)
  AWS CLI | OK (/opt/homebrew/bin/aws)

-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
  AWS CLI Version        | OK (aws-cli/2.11.9 Python/3.11.2 Darwin/22.4.0 source/arm64 prompt/off)
  Session Manager Plugin | OK (1.2.463.0)

-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : eu-west-1
Cluster: app-service-cluster-test
Task   : b460c8c1bb334429a39ff7a4b1bad180
-------------------------------------------------------------
  Cluster Configuration  |
     KMS Key       : Not Configured
     Audit Logging : DEFAULT
     S3 Bucket Name: Not Configured
     CW Log Group  : Not Configured
  Can I ExecuteCommand?  | arn:aws:iam::117038214493:user/cli-admin
     ecs:ExecuteCommand: allowed
     ssm:StartSession denied?: allowed
  Task Status            | RUNNING
  Launch Type            | Fargate
  Platform Version       | 1.4.0
  Exec Enabled for Task  | OK
  Container-Level Checks |
    ----------
      Managed Agent Status
    ----------
         1. RUNNING for "app-service-test-container"
    ----------
      Init Process Enabled (app-service-task-definition-test:18)
    ----------
         1. Disabled - "app-service-test-container"
    ----------
      Read-Only Root Filesystem (app-service-task-definition-test:18)
    ----------
         1. Disabled - "app-service-test-container"
  Task Role Permissions  | arn:aws:iam::117038214493:role/TuskProdECSTaskRole
     ssmmessages:CreateControlChannel: allowed
     ssmmessages:CreateDataChannel: allowed
     ssmmessages:OpenControlChannel: allowed
     ssmmessages:OpenDataChannel: allowed
  VPC Endpoints          |
    Found existing endpoints for vpc-00bfcd992d7f50681:
      - com.amazonaws.eu-west-1.ssmmessages
      - com.amazonaws.eu-west-1.s3
      - com.amazonaws.vpce.eu-west-1.vpce-svc-0e7975f61ffb9d0f7
  Environment Variables  | (app-service-task-definition-test:18)
       1. container "app-service-test-container"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined

All the configuration seems to be okay... the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY is not defined. But I am still getting TargetNotConnectedException. Am I missing something?

AWS CLI version: 2.11.9

andarocks · 2023-04-11T18:16:08Z

But the AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY variables are defined in the .env file inside the container. I hope that's not the issue.

istvanfedak-nbcu · 2023-08-04T19:44:27Z

I'm experiencing this exact same issue. The aws ecs execute-command was working for me last week and it stopped working.

obaqueiro · 2023-12-27T19:40:39Z

Did anyone else stumbled to this problem again?

We started getting this issue again. There are no AWS_ACCESS_KEY / SECRET defined and check-ecs-exec.sh shows everything OK (green and yellow)

-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
  jq      | OK (/usr/bin/jq)
  AWS CLI | OK (/usr/local/bin/aws)

-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
t.21 prompt/off)
  Session Manager Plugin | OK (1.2.497.0)

-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : us-east-1
Cluster: cluster-name
Task   : 949fd5e48ebf4ba4b895176cb0c36d50
  Cluster Configuration  |
     KMS Key       : Not Configured
     Audit Logging : DEFAULT
     S3 Bucket Name: Not Configured
     CW Log Group  : Not Configured
  Can I ExecuteCommand?  | arn:aws:iam::xxx:user/deployment
     ecs:ExecuteCommand: allowed
     ssm:StartSession denied?: allowed
  Task Status            | RUNNING
  Platform Version       | 1.4.0
  Exec Enabled for Task  | OK
  Container-Level Checks | 
    ----------
      Managed Agent Status
    ----------
         1. RUNNING for "metabase_app_dev"
    ----------
      Init Process Enabled (metabase_dev:3)
    ----------
         1. Disabled - "metabase_app_dev"
    ----------
      Read-Only Root Filesystem (metabase_dev:3)
    ----------
         1. Disabled - "metabase_app_dev"
  Task Role Permissions  | arn:aws:iam::xxx:role/metabase_ecsTaskExecutionRole_dev
     ssmmessages:CreateControlChannel: allowed
     ssmmessages:CreateDataChannel: allowed
     ssmmessages:OpenControlChannel: allowed
     ssmmessages:OpenDataChannel: allowed
  VPC Endpoints          | 
    Found existing endpoints for vpc-081adc23fcb697c58:
      - com.amazonaws.us-east-1.execute-api
      - com.amazonaws.us-east-1.secretsmanager
      - com.amazonaws.vpce.us-east-1.vpce-svc-0256367e65088edb5
      - com.amazonaws.us-east-1.ssmmessages
  Environment Variables  | (metabase_dev:3)
       1. container "metabase_app_dev"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY:: not defined

 
$ aws ecs execute-command  --cluster cluster-name --task 949fd5e48ebf4ba4b895176cb0c36d50 --container  metabase_app_dev --command 'sh' --interactive

The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.


An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.

stickperson · 2024-07-11T07:14:24Z

I'm experiencing issues as well. I'm using Fargate and can start two tasks in the same subnet, one will work and the other will not.

tim-finnigan mentioned this issue Apr 1, 2022

ECS execute-command failed due to an internal error (aws/aws-cli/issues/6562) aws/aws-cli#6834

Closed

tim-finnigan mentioned this issue Apr 6, 2022

ECS execute-command failed due to an internal error. aws/aws-cli#6562

Closed

2 tasks

Thor-Bjorgvinsson closed this as completed May 5, 2022

VishnuKarthikRavindran mentioned this issue Jun 16, 2022

SSM agent under Fargate using the new ECS Exec feature is crashing #361

Open

istvanfedak-nbcu mentioned this issue Aug 4, 2023

ECS execute-command failed due to an internal error (aws/aws-cli/issues/6834) aws/aws-cli#8080

Closed

Recent updates possibly broke CLI execute-command #435

Recent updates possibly broke CLI execute-command #435

Comments

ssyberg commented Apr 1, 2022

GeorgeNagel commented Apr 1, 2022

ssyberg commented Apr 1, 2022

tim-finnigan commented Apr 1, 2022 • edited Loading

ssyberg commented Apr 1, 2022 • edited Loading

ssyberg commented Apr 1, 2022

tim-finnigan commented Apr 1, 2022

farkmarnum commented Apr 1, 2022 • edited Loading

nathando commented Apr 3, 2022

raptorcZn commented Apr 4, 2022

nicolasbuch commented Apr 4, 2022

tim-finnigan commented Apr 4, 2022

Thor-Bjorgvinsson commented Apr 4, 2022 • edited Loading

pauricthelodger commented Apr 4, 2022 • edited Loading

GeorgeNagel commented Apr 4, 2022

yufio commented Apr 4, 2022 • edited Loading

Thor-Bjorgvinsson commented Apr 4, 2022

akhiljalagam commented Apr 6, 2022

Thor-Bjorgvinsson commented Apr 6, 2022 • edited Loading

Sohett commented Apr 11, 2022

Thor-Bjorgvinsson commented Apr 18, 2022

jmagoon commented Apr 19, 2022

ZacBridge commented Apr 21, 2022

GeorgeNagel commented Apr 24, 2022

djGrill commented Apr 25, 2022

justinko commented Apr 25, 2022

bigbluechicken commented Apr 26, 2022

serhiibeznisko commented Apr 26, 2022

Thor-Bjorgvinsson commented Apr 26, 2022

Sohett commented May 5, 2022

benoawfu commented May 5, 2022

Thor-Bjorgvinsson commented May 5, 2022

andarocks commented Apr 11, 2023 • edited Loading

andarocks commented Apr 11, 2023

istvanfedak-nbcu commented Aug 4, 2023

obaqueiro commented Dec 27, 2023 • edited Loading

stickperson commented Jul 11, 2024

Recent updates possibly broke CLI `execute-command` #435

Recent updates possibly broke CLI `execute-command` #435

tim-finnigan commented Apr 1, 2022 •

edited

Loading

ssyberg commented Apr 1, 2022 •

edited

Loading

farkmarnum commented Apr 1, 2022 •

edited

Loading

Thor-Bjorgvinsson commented Apr 4, 2022 •

edited

Loading

pauricthelodger commented Apr 4, 2022 •

edited

Loading

yufio commented Apr 4, 2022 •

edited

Loading

Thor-Bjorgvinsson commented Apr 6, 2022 •

edited

Loading

andarocks commented Apr 11, 2023 •

edited

Loading

obaqueiro commented Dec 27, 2023 •

edited

Loading