Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DJM] Add troubleshooting section after setup instructions #26333

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

aboitreaud
Copy link
Contributor

@aboitreaud aboitreaud commented Nov 18, 2024

What does this PR do? What is the motivation?

Add a troubleshooting section on the DJM documentation page, common to Databricks, Dataproc, and EMR.

We've seen customers trying to install the product and struggling to understand what was the error so this section fulfills that need.

Merge instructions

Merge queue is enabled in this repo. To have it automatically merged after it receives the required reviews, create the PR (from a branch that follows the <yourname>/description naming convention) and then add the following PR comment:

/merge

Additional notes

@aboitreaud aboitreaud changed the title [DJMM] Add troubleshooting section after setup instructions [DJM] Add troubleshooting section after setup instructions Nov 18, 2024
@github-actions github-actions bot added the Architecture Everything related to the Doc backend label Nov 18, 2024
@aboitreaud aboitreaud force-pushed the adrien.boitreaud/djm-troubleshooting-doc branch 2 times, most recently from a8a1dc1 to 67b2a3a Compare November 18, 2024 12:37
@aboitreaud aboitreaud marked this pull request as ready for review November 18, 2024 12:59
@aboitreaud aboitreaud requested a review from a team as a code owner November 18, 2024 12:59
@aboitreaud aboitreaud force-pushed the adrien.boitreaud/djm-troubleshooting-doc branch from 67b2a3a to 851f05f Compare November 18, 2024 14:36
Copy link
Contributor

@hestonhoffman hestonhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left you some edits

@@ -0,0 +1,10 @@
Data Jobs Monitoring requires to have the Datadog Agent running in the background. You can check that it is correctly installed and running on your cluster with this command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Data Jobs Monitoring requires to have the Datadog Agent running in the background. You can check that it is correctly installed and running on your cluster with this command:
Data Jobs Monitoring requires a Datadog Agent running in the background. You can check that it is correctly installed and running on your cluster with this command:

```shell
sudo datadog-agent status
```
If it is not the case, you may want to check the log file of the installation. On your cluster, these logs are located in `/tmp/datadog-djm-init.log`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If it is not the case, you may want to check the log file of the installation. On your cluster, these logs are located in `/tmp/datadog-djm-init.log`.
If there is no Agent running, check the log file of the installation. On your cluster, these logs are located in `/tmp/datadog-djm-init.log`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Heston, I'm sorry I had unpushed changes suggesed by my team. I just pushed them, can you take another look please ?

```
If it is not the case, you may want to check the log file of the installation. On your cluster, these logs are located in `/tmp/datadog-djm-init.log`.

For further support, make sure the init script contains the following line so that the install logs are sent to the Datadog support team.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For further support, make sure the init script contains the following line so that the install logs are sent to the Datadog support team.
For further support, make sure the init script contains the following line to ensure that the install logs are sent to the Datadog support team.

@aboitreaud
Copy link
Contributor Author

Hi Heston, I'm sorry I had unpushed changes suggesed by my team. I just pushed them, can you take another look please ?

Copy link
Contributor

@hestonhoffman hestonhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestions

layouts/shortcodes/djm-install-troubleshooting.en.md Outdated Show resolved Hide resolved
layouts/shortcodes/djm-install-troubleshooting.en.md Outdated Show resolved Hide resolved
@@ -0,0 +1,10 @@
The init script installs the Datadog Agent. To make sure it is properly installed, run the Agent status command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add details of where and how you can run this? e.g. could just be links to how to ssh into a node on the cluster for the different platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some words saying that this should be run after ssh-ing into the cluster, but since this doc is shared by all DJM platforms, it's not easy for now to add platform-specific guidelines. We can do that later in a broader doc revamp to make this a FAQ section.

@@ -0,0 +1,10 @@
The init script installs the Datadog Agent. To make sure it is properly installed, run the Agent status command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start this as more of a FAQ section. e.g. let's have this content under the question:

I installed Data Jobs Monitoring but don't see any data in the product.

  1. The init script installs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. For now, I believe "I installed Data Jobs Monitoring but don't see any data in the product." is the only question that this troubleshooting section aims to answer. I agree that checking the Agent status is only the first part of the answer, we intended to add a second point on checking the Tracer injection later.

I would be in favor of keeping the current structure for now so that the customer that we know is trying to set up DJM can find some answers quickly, and then plan in the backlog a task to create a more exhaustive FAQ. Is that ok for you ?

I've added a sentence at the top of the section, saying it specifically adresses the lack of data after installation in DJM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Architecture Everything related to the Doc backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants