From b70b5b47274a824bddddb0217782a535d3da33bd Mon Sep 17 00:00:00 2001 From: cecilia saixue watt Date: Fri, 8 Nov 2024 13:04:25 -0800 Subject: [PATCH] DOCS-9299 - Airflow readme updates (#18931) * starting work * adding last changes --- airflow/README.md | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/airflow/README.md b/airflow/README.md index ee8ef7135420e..2407dfc7be1c0 100644 --- a/airflow/README.md +++ b/airflow/README.md @@ -20,7 +20,12 @@ In addition to metrics, the Datadog Agent also sends service checks related to A All steps below are needed for the Airflow integration to work properly. Before you begin, [install the Datadog Agent][3] version `>=6.17` or `>=7.17`, which includes the StatsD/DogStatsD mapping feature. ### Configuration -There are two forms of the Airflow integration. There is the Datadog Agent integration which makes requests to a provided endpoint for Airflow to report whether it can connect and is healthy. Then there is the Airflow StatsD portion where Airflow can be configured to send metrics to the Datadog Agent, which can remap the Airflow notation to a Datadog notation. +There are two parts of the Airflow integration: +- The Datadog Agent portion, which makes requests to a provided endpoint for Airflow to report whether it can connect and is healthy. The Agent integration also queries Airflow to produce some of its own metrics. +- The Airflow StatsD portion, where Airflow can be configured to send metrics to the Datadog Agent, which can remap the Airflow notation to a Datadog notation. + +The Airflow integration's [metrics](#metrics) come from both the Agent and StatsD portions. + @@ -303,6 +308,26 @@ For containerized environments, see the [Autodiscovery Integration Templates][8] Ensure that `url` matches your Airflow [webserver `base_url`][19], the URL used to connect to your Airflow instance. Replace `localhost` with the template variable `%%host%%`. +If you are using Airflow's Helm chart, this [exposes the webserver as a ClusterIP service][22] that you should use in the `url` parameter. + +For example, your Autodiscovery annotations may look like the following: + +``` +apiVersion: v1 +kind: Pod +# (...) +metadata: + name: '' + annotations: + ad.datadoghq.com/.checks: | + { + "airflow": { + "instances": ["url": "http://airflow-ui.%%kube_namespace%%.svc.cluster.local:8080"] + } + } + # (...) +``` + ##### Connect Airflow to DogStatsD Connect Airflow to DogStatsD (included in the Datadog Agent) by using the Airflow `statsd` feature to collect metrics. For more information about the metrics reported by the Airflow version used and the additional configuration options, see the Airflow documentation below: @@ -337,6 +362,9 @@ This should direct the StatsD traffic from the Airflow container to a Datadog Ag [{"name":"airflow","prefix":"airflow.","mappings":[{"match":"airflow.*_start","name":"airflow.job.start","tags":{"job_name":"$1"}},{"match":"airflow.*_end","name":"airflow.job.end","tags":{"job_name":"$1"}},{"match":"airflow.*_heartbeat_failure","name":"airflow.job.heartbeat.failure","tags":{"job_name":"$1"}},{"match":"airflow.operator_failures_*","name":"airflow.operator_failures","tags":{"operator_name":"$1"}},{"match":"airflow.operator_successes_*","name":"airflow.operator_successes","tags":{"operator_name":"$1"}},{"match":"airflow\\.dag_processing\\.last_runtime\\.(.*)","match_type":"regex","name":"airflow.dag_processing.last_runtime","tags":{"dag_file":"$1"}},{"match":"airflow\\.dag_processing\\.last_run\\.seconds_ago\\.(.*)","match_type":"regex","name":"airflow.dag_processing.last_run.seconds_ago","tags":{"dag_file":"$1"}},{"match":"airflow\\.dag\\.loading-duration\\.(.*)","match_type":"regex","name":"airflow.dag.loading_duration","tags":{"dag_file":"$1"}},{"match":"airflow.dagrun.*.first_task_scheduling_delay","name":"airflow.dagrun.first_task_scheduling_delay","tags":{"dag_id":"$1"}},{"match":"airflow.pool.open_slots.*","name":"airflow.pool.open_slots","tags":{"pool_name":"$1"}},{"match":"airflow.pool.queued_slots.*","name":"airflow.pool.queued_slots","tags":{"pool_name":"$1"}},{"match":"airflow.pool.running_slots.*","name":"airflow.pool.running_slots","tags":{"pool_name":"$1"}},{"match":"airflow.pool.used_slots.*","name":"airflow.pool.used_slots","tags":{"pool_name":"$1"}},{"match":"airflow.pool.starving_tasks.*","name":"airflow.pool.starving_tasks","tags":{"pool_name":"$1"}},{"match":"airflow\\.dagrun\\.dependency-check\\.(.*)","match_type":"regex","name":"airflow.dagrun.dependency_check","tags":{"dag_id":"$1"}},{"match":"airflow\\.dag\\.(.*)\\.([^.]*)\\.duration","match_type":"regex","name":"airflow.dag.task.duration","tags":{"dag_id":"$1","task_id":"$2"}},{"match":"airflow\\.dag_processing\\.last_duration\\.(.*)","match_type":"regex","name":"airflow.dag_processing.last_duration","tags":{"dag_file":"$1"}},{"match":"airflow\\.dagrun\\.duration\\.success\\.(.*)","match_type":"regex","name":"airflow.dagrun.duration.success","tags":{"dag_id":"$1"}},{"match":"airflow\\.dagrun\\.duration\\.failed\\.(.*)","match_type":"regex","name":"airflow.dagrun.duration.failed","tags":{"dag_id":"$1"}},{"match":"airflow\\.dagrun\\.schedule_delay\\.(.*)","match_type":"regex","name":"airflow.dagrun.schedule_delay","tags":{"dag_id":"$1"}},{"match":"airflow.scheduler.tasks.running","name":"airflow.scheduler.tasks.running"},{"match":"airflow.scheduler.tasks.starving","name":"airflow.scheduler.tasks.starving"},{"match":"airflow.sla_email_notification_failure","name":"airflow.sla_email_notification_failure"},{"match":"airflow\\.task_removed_from_dag\\.(.*)","match_type":"regex","name":"airflow.dag.task_removed","tags":{"dag_id":"$1"}},{"match":"airflow\\.task_restored_to_dag\\.(.*)","match_type":"regex","name":"airflow.dag.task_restored","tags":{"dag_id":"$1"}},{"match":"airflow.task_instance_created-*","name":"airflow.task.instance_created","tags":{"task_class":"$1"}},{"match":"airflow\\.ti\\.start\\.(.+)\\.(\\w+)","match_type":"regex","name":"airflow.ti.start","tags":{"dag_id":"$1","task_id":"$2"}},{"match":"airflow\\.ti\\.finish\\.(\\w+)\\.(.+)\\.(\\w+)","name":"airflow.ti.finish","match_type":"regex","tags":{"dag_id":"$1","task_id":"$2","state":"$3"}}]}] ``` +To add non-static tags to the StatsD metrics, you must use DogStatsD mapper profiles. [See an example mapper profile][21] that adds `service` and `env` tags. + + ##### Log collection _Available for Agent versions >6.0_ @@ -370,6 +398,8 @@ In addition, [Airflow DatadogHook][16] can be used to interact with Datadog: See [metadata.csv][17] for a list of metrics provided by this check. +**Note**: `airflow.healthy`, `airflow.can_connect`, `airflow.dag.task.total_running`, and `airflow.dag.task.ongoing_duration` metrics are collected from the Agent portion of the integration. All other metrics come from StatsD. + ### Events The Airflow check does not include any events. @@ -380,6 +410,10 @@ See [service_checks.json][18] for a list of service checks provided by this inte ## Troubleshooting +### HTTP 403 errors for Agent integration + +You may need to configure parameters for the Datadog Agent to make authenticated requests to Airflow's API. Use one of the available [configuration options][23]. + Need help? Contact [Datadog support][11]. @@ -403,3 +437,6 @@ Need help? Contact [Datadog support][11]. [18]: https://github.com/DataDog/integrations-core/blob/master/airflow/assets/service_checks.json [19]: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#base-url [20]: https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html +[21]: http://docs.datadoghq.com/resources/json/airflow_ust.json +[22]: https://github.com/apache/airflow/blob/main/chart/values.yaml#L1522-L1529 +[23]: https://github.com/DataDog/integrations-core/blob/master/airflow/datadog_checks/airflow/data/conf.yaml.example#L84-L118 \ No newline at end of file