Prometheus file discovery for AWS ECS

This adds ability to dynamically discover targets to scrape in AWS ECS cluster. Prometheus integration via static file_sd_config is used for that.

Why yet another implementation?

There are at least 2 other great implementations of same idea:

teralytics/prometheus-ecs-discovery in Go
signal-ai/prometheus-ecs-sd in Python

Unfortunately both of them do not support scraping the same single container on multiple ports (and /urls).

Usage

Set dockerLabels of Containers in your Tasks:

"dockerLabels": {
    "PROMETHEUS_SCRAPES": "8080/internal/metrics,9106",
    "PROMETHEUS_LABELS": "__scheme__=https,skip_15s=true"
},

PROMETHEUS_SCRAPES is a comma delimited list of port/metric_path to scrape. Note that metric_path=/metric by default, so usually you only need to provide port only. This could be single value, for one port to scrape on a container.
PROMETHEUS_LABELS (optional) comma delimited list of key=value labels to add to your container in Prometheus. This is the way to have some containers scraped 15s or 1m via multiple targets with relabel_configs and action: keep.

Start discoverer:

docker run -it --rm -v /tmp:/tmp -e AWS_ACCESS_KEY_ID=AKIAXXX -e AWS_SECRET_ACCESS_KEY=xxx -e AWS_DEFAULT_REGION=eu-west-1 sepa/ecs-sd -h
usage: prometheus-ecs-sd [-h] [-f FILE] [-i INTERVAL] [-l {debug,info,warn}]

Prometheus file discovery for AWS ECS

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  File to write tasks (default: /tmp/ecs_file_sd.yml)
  -c CLUSTER, --cluster CLUSTER
                        Return metrics only for this Cluster name (default: all)
  -i INTERVAL, --interval INTERVAL
                        Interval to discover ECS tasks, seconds (default: 60)
  -l {debug,info,warn}, --log {debug,info,warn}
                        Logging level (default: info)
  -p PORT, --port PORT  Port to serve /metrics (default: 8080)

Verify that you get valid /tmp/ecs_file_sd.yml:

- labels:
    __task_group: service:backend
    __container_image: backend:1
    __container_runtime_id: 071cbe78cfe3fff4ea037cec21008b0cb710214a42489a24fee8e49c698a969b
    container_arn: arn:aws:ecs:eu-west-1:111:container/064c4ef7-6bb2-4ec3-b619-e0d6896f52c4
    container_name: backend-dev
    instance_id: i-08a6047039ca60159
    task_name: backend
    task_revision: 52
    __metrics_path__: /internal/metrics
  targets:
  - 10.0.0.3:32342
- labels:
    __task_group: service:backend
    __container_image: backend:1
    __container_runtime_id: 501d74cfe79a1073bd1370d9f6c58cf9007a3d897ffec6bac2acaba74cd85451
    container_arn: arn:aws:ecs:eu-west-1:111:container/064c4ef7-6bb2-4ec3-b619-e0d6896f52c4
    container_name: backend-dev
    instance_id: i-08a6047039ca60159
    task_name: backend
    task_revision: 52
  targets:
  - 10.0.0.3:32799
- labels:
    __task_group: service:node-exporter
    __container_image: prom/node-exporter:v1.0.1
    __container_runtime_id: ae854633add6f92d0681c6147822df44422a8db2cc5a116f289f07e3adc9db56
    container_arn: arn:aws:ecs:eu-west-1:111:container/978972c8-646d-49cc-9933-4bb3daa2eeea
    container_name: node-exporter
    instance_id: i-08a6047039ca60159
    task_name: node-exporter
    task_revision: 13
  targets:
  - 10.0.0.3:9100
- labels:
    __group: service:cadvisor
    __container_image: gcr.io/google-containers/cadvisor:v0.35.0
    __container_runtime_id: d89fdd2da5cdcbd50ddd68395c8cc6a70c77c84d81288c3669a984835803a39a
    container_arn: arn:aws:ecs:eu-west-1:111:container/7abd91d2-091d-4f12-80a7-14c279260aac
    container_name: cadvisor
    instance_id: i-08a6047039ca60159
    task_name: cadvisor
    task_revision: 8
  targets:
  - 10.0.0.3:32798

By default this file would be updated each 60s. Script caches API responses to not hit AWS API limits, but anyway it have to list all Tasks each time, so do not set --interval too low.

Please note that labels prefixed with __ are only available at relabel phase: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config

Labels starting with __ will be removed from the label set after target relabeling is completed.

You can use relabel_action: replace to store these additional labels if you need them.

Integrate to Prometheus:

scrape_configs:
# you only need this part
  - job_name: ecs
    file_sd_configs:
      - files:
        - /prometheus/sd/ecs_file_sd.yml
        refresh_interval: 1m

# optional relabel example 
    relabel_configs:
      # fix instance from host:mapped_port to containername-rev-hash
      - source_labels: [container_name, task_revision, container_arn]
        action: replace
        regex: (.*);(.*);.*-(.*)
        replacement: $1-$2-$3
        target_label: instance
      # use instance_id for node-exporter and cadvisor
      - source_labels: [container_name, instance_id]
        regex: (node-exporter|cadvisor);(.*)
        replacement: $2
        target_label: instance
      # not needed anymore
      - regex: container_arn
        action: labeldrop

Minimal IAM policy for the discoverer:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "ECS:ListClusters",
        "ECS:ListTasks",
        "ECS:DescribeClusters",
        "ECS:DescribeContainerInstances",
        "ECS:DescribeTasks",
        "ECS:DescribeTaskDefinition",
        "ECS:ListServices",
        "ECS:DescribeServices",
        "EC2:DescribeInstances"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

ECS Metrics

This container also could expose ECS metrics (which are not available in CloudWatch) in Prometheus format on port 8080 and path /metrics:

ecs_service_desired_tasks{service="node-exporter"} 1
ecs_service_running_tasks{service="node-exporter"} 1
ecs_service_pending_tasks{service="node-exporter"} 0

These metrics are not cached, so each scrape would lead to describe_service API call. You can limit metrics to specific ECS Cluster via --cluster cli argument.

TODO

No FARGATE support yet, as I have only EC2 ECS clusters

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Readme.md		Readme.md
docker-compose.yml		docker-compose.yml
prometheus-ecs-sd.py		prometheus-ecs-sd.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prometheus file discovery for AWS ECS

Why yet another implementation?

Usage

ECS Metrics

TODO

About

Releases

Packages

Contributors 2

Languages

sepich/prometheus-ecs-sd

Folders and files

Latest commit

History

Repository files navigation

Prometheus file discovery for AWS ECS

Why yet another implementation?

Usage

ECS Metrics

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages