Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS execute-command failed due to an internal error. #47

Open
nic-russo opened this issue Dec 1, 2021 · 11 comments
Open

ECS execute-command failed due to an internal error. #47

nic-russo opened this issue Dec 1, 2021 · 11 comments

Comments

@nic-russo
Copy link

Hi there, I'm trying to run execute-command to open an interactive shell against my ECS Fargate task. I'm using this checker to validate my configuration:

$ bash <( curl -Ls https://raw.githubusercontent.com/aws-containers/amazon-ecs-exec-checker/main/check-ecs-exec.sh ) clusterName cf41c924968e426c9be535f3f47545be
-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
  jq      | OK (/usr/bin/jq)
  AWS CLI | OK (/usr/local/bin/aws)

-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
  AWS CLI Version        | OK (aws-cli/2.4.0 Python/3.8.8 Linux/5.11.0-40-generic exe/x86_64.ubuntu.20 prompt/off)
  Session Manager Plugin | OK (1.2.205.0)

-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : eu-south-1
Cluster: clusterName
Task   : cf41c924968e426c9be535f3f47545be
-------------------------------------------------------------
  Cluster Configuration  | Audit Logging Not Configured
  Can I ExecuteCommand?  | arn:aws:iam::ACCOUNT_ID:role/ADMIN_ROLE_NAME
     ecs:ExecuteCommand: allowed
     ssm:StartSession denied?: allowed
  Task Status            | RUNNING
  Launch Type            | Fargate
  Platform Version       | 1.4.0
  Exec Enabled for Task  | OK
  Container-Level Checks | 
    ----------
      Managed Agent Status
    ----------
         1. RUNNING for "taskName"
    ----------
      Init Process Enabled (taskName:1)
    ----------
         1. Enabled - "taskName"
    ----------
      Read-Only Root Filesystem (taskName:1)
    ----------
         1. Disabled - "taskName"
  Task Role Permissions  | arn:aws:iam::ACCOUNT_ID:role/taskName-ecs-task
     ssmmessages:CreateControlChannel: allowed
     ssmmessages:CreateDataChannel: allowed
     ssmmessages:OpenControlChannel: allowed
     ssmmessages:OpenDataChannel: allowed
  VPC Endpoints          | 
    Found existing endpoints for vpc-ID:
      - com.amazonaws.eu-south-1.ssm
      - com.amazonaws.eu-south-1.ec2messages
      - com.amazonaws.eu-south-1.ssmmessages

However, I'm getting TargetNotConnectedException. I've also opened an issue here.

Am I missing something..?

@toricls
Copy link
Contributor

toricls commented Dec 1, 2021

Hi, @nic-russo! Thank you for reaching out to us here.

In general TargetNotConnectedException indicates that the required connection between the managed agent running in your task container and SSM Session Manager.

Supposing there is no bug in the exec checker script itself, could you possible check/try the following?

  1. VPC endpoint policies, if the VPC endpoints above have any VPC endpoint policy
  2. Wait a few minutes and try again (because managed agents regularly try to reconnect)
  3. Stop the task and try a new task (if the task is under an ECS service)
  4. Update the session manager plugin to the latest (1.2.279.0 os the latest as of Dec. 1st 2021)
  5. See and check CloudTrail logs if there is any error API calls related to ECS and SSM Session Manager

Also it would be helpful to debug this issue since there is known limitations in the exec checker script:

  1. The script doesn't support specific IAM roles/policies with (1) Conditions or (2) IAM permission boundaries. In this case you need to check manually that (a) your IAM role ("role/ADMIN_ROLE_NAME" in the script result) is NOT limited to call ExecuteCommand API, and (b) the task role ("role/taskName-ecs-task") is NOT limited to call SSM Session Manager APIs.

(Comment updated since it looks your IAM user was at least already able to call ExecuteCommand API)

@nic-russo
Copy link
Author

Hi @toricls, thanks for your support! I went through your list:

  1. The com.amazonaws.eu-south-1.ssmmessages VPC endpoint has the following policy
{
    "Statement": [
        {
            "Action": "*",
            "Effect": "Allow",
            "Resource": "*",
            "Principal": "*"
        }
    ]
}
  1. (3. 4.) Done, no luck (Yes, I'm running the task into an ECS service - Fargate).

5 . I see 2 events in cloudtrail:

Event Name: ExecuteCommand
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "ID_HERE:botocore-session-1638379635",
        "arn": "arn:aws:sts::ACCOUNT_ID:assumed-role/ADMIN_ROLE/botocore-session-1638379635",
        "accountId": "ACCOUNT_ID",
        "accessKeyId": "ACCOUNT_KEY",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "ID_HERE",
                "arn": "arn:aws:iam::ACCOUNT_ID:role/ADMIN_ROLE",
                "accountId": "ACCOUNT_ID",
                "userName": "ADMIN_ROLE"
            },
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2021-12-01T17:27:16Z",
                "mfaAuthenticated": "false"
            }
        }
    },
    "eventTime": "2021-12-01T17:41:56Z",
    "eventSource": "ecs.amazonaws.com",
    "eventName": "ExecuteCommand",
    "awsRegion": "eu-south-1",
    "sourceIPAddress": "IP_HERE",
    "userAgent": "aws-cli/2.4.0 Python/3.8.8 Linux/5.11.0-40-generic exe/x86_64.ubuntu.20 prompt/off command/ecs.execute-command",
    "errorCode": "ClientException",
    "errorMessage": "The execute command failed due to an internal error. Try again later.",
    "requestParameters": {
        "cluster": "clusterName",
        "container": "containerName",
        "command": "/bin/bash",
        "interactive": true,
        "task": "bc72af51a5d942519202ed10342ef307"
    },
    "responseElements": null,
    "requestID": "340b94d2-ac03-4c4f-a386-e8728b95adc0",
    "eventID": "c3503af0-3fa1-41e7-a447-e5ee70055641",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "ACCOUNT_ID",
    "eventCategory": "Management"
}
Event Name: StartSession
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "ID_HERE:ecs-execute-command",
        "arn": "arn:aws:sts::ACCOUNT_ID:assumed-role/AWSServiceRoleForECS/ecs-execute-command",
        "accountId": "ACCOUNT_ID",
        "accessKeyId": "ACCOUNT_KEY",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "ID_HERE",
                "arn": "arn:aws:iam::ACCOUNT_ID:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS",
                "accountId": "ACCOUNT_ID",
                "userName": "AWSServiceRoleForECS"
            },
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2021-12-01T17:41:56Z",
                "mfaAuthenticated": "false"
            }
        },
        "invokedBy": "ecs.amazonaws.com"
    },
    "eventTime": "2021-12-01T17:41:56Z",
    "eventSource": "ssm.amazonaws.com",
    "eventName": "StartSession",
    "awsRegion": "eu-south-1",
    "sourceIPAddress": "ecs.amazonaws.com",
    "userAgent": "ecs.amazonaws.com",
    "errorCode": "TargetNotConnected",
    "errorMessage": "ecs:aws-monitor_bc72af51a5d942519202ed10342ef307_bc72af51a5d942519202ed10342ef307-1708420469 is not connected.",
    "requestParameters": {
        "target": "ecs:aws-monitor_bc72af51a5d942519202ed10342ef307_bc72af51a5d942519202ed10342ef307-1708420469",
        "documentName": "AmazonECS-ExecuteInteractiveCommand",
        "parameters": {
            "cloudWatchLogGroupName": [
                "ECS_aws-monitor"
            ],
            "command": [
                "/bin/bash"
            ]
        }
    },
    "responseElements": null,
    "requestID": "57d9cf3e-25d6-470a-9f0c-cad8a12d3778",
    "eventID": "a2da3ef3-2f00-49ee-940f-eccff1fb13c1",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "ACCOUNT_ID",
    "eventCategory": "Management"
}

6 . The task role has the AmazonSSMManagedInstanceCore policy and some non-related ones

@justfathi
Copy link

any movement on this issue?

@ssyberg
Copy link

ssyberg commented Mar 31, 2022

Following, having the same issue. A little additional info:

We manage our infra with terraform and have ~8 identical deployments with literally no variation besides names and we're seeing this issue intermittently on random deployments.

@GeorgeNagel
Copy link

I'm also apparently experiencing this error intermittently. I am managing my infra via terraform as well. Despite rolling back to previous topolagies, I'm still seeing this on tasks from some services and not others.

@j4keh
Copy link

j4keh commented Apr 1, 2022

I am facing the same issue.
Tasks created prior to March 29 had no problems.
(This may be a coincidence, but it seems to coincide with the release date of v3.1.1188.0 of SSM Agent.)

@pauricthelodger
Copy link

Same issue here as of yesterday

@tim-finnigan
Copy link

This looks related to #49.

Please check if you have the environment variables AWS_ACCESS_KEY / AWS_SECRET_ACCESS_KEY set and if unsetting those solves this issue.

@BJClark
Copy link

BJClark commented Apr 1, 2022

I had the same issue as well and as mentioned by @tim-finnigan, changing some ENV vars called AWS_ACCESS_KEY / AWS_SECRET_ACCESS_KEY to different variable names ended up solving the issue for us a well.

@Wavewash
Copy link

Wavewash commented Jul 8, 2022

I was experiencing this same issue but found a fix.

Initially made sure:

  • All checks passed amazon-ecs-exec-checker
  • Our ECS tasks did not have ENV vars set for AWS_ACCESS_KEY or AWS_SECRET_ACCESS_KEY
  • The ECS agent was updated on the instance to latest (Agent version 1.61.3, Docker version 20.10.13).

Solution:
After double checking all settings (roles, permissions, etc) I tried updating the AMI we were using on the instance and that fixed the issue and was successfully able to execute-command on the task!

The AMI that did not allow for execute command:

  • AMI ID: ami-0a5e7c9183d1cea27
  • AMI name: amzn2-ami-ecs-hvm-2.0.20220209-x86_64-ebs

Updated to this AMI which does allow for execute command:

  • AMI ID: ami-040d909ea4e56f8f3
  • AMI name: amzn2-ami-ecs-hvm-2.0.20220630-x86_64-ebs

(Which is currently latest via https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)

@adrienbrault
Copy link

Hey,

I was able to resolve this issue by following Generate logs for ECS Exec to identify issues from https://repost.aws/knowledge-center/fargate-ecs-exec-errors

I found this log:

ERROR [ssm-agent-worker] [MessageService] [MGSInteractor] Failed to get controlchannel token, error: CreateControlChannel failed with error: createControlChannel request failed: failed to make http client call: Post "https://ssmmessages.us-west-2.amazonaws.com/v1/control-channel/ecs:xxx": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

I removed the ssmmessages VPC endpoint, and aws ecs execute-command worked. (The VPC has an internet gateway)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests