Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.12][Fleet] Upgrade details telemetry (#173356) #173502

Merged
merged 1 commit into from
Dec 18, 2023

Conversation

juliaElastic
Copy link
Contributor

Backport #173356

I got a merge conflict on backporting to 8.12, because the presets telemetry was not backportet, is that intentional? #172838

Relates elastic#162448

Added upgrade details telemetry, publishing to `fleet-agents index` in
telemetry cluster, each bucket as separate documents.
Implemented by doing a `multi_terms` aggregation to group the same
`target_version, state, error_msg` values together.
Do we also want to include the agent count in each bucket in the
telemetry event? @jlind23 @ycombinator

Note: since this task runs every hour, it will most likely capture the
`UPG_FAILED` states, since the other (success) states are temporarily on
the agent docs, and removed if the upgrade is successful.

E.g. 2 docs like the below become one telemetry event
```
// .fleet-agents
   upgrade_details: {
            target_version: '8.12.0',
            state: 'UPG_FAILED',
            metadata: {
              error_msg: 'Download failed',
            },
          },

// telemetry event
{
      target_version: '8.12.0',
      state: 'UPG_FAILED',
      error_msg: 'Download failed',
    }
```

To verify:
- start kibana 8.13-SNAPSHOT locally
- set an invalid agent download source in Fleet Settings
- enroll an agent version 8.12-SNAPSHOT
- upgrade to 8.13-SNAPSHOT with the API
```
POST kbn:/api/fleet/agents/<agent_id>/upgrade
  {
    "version": "8.13.0-SNAPSHOT",
    "force": true
  }
```
- wait 15m so that the upgrade goes to failed state
- wait up to 1h for the telemetry task to run (speed up locally by
setting a shorter interval in FleetUsageSender in kibana)
- verify in debug logs:
```
[2023-12-14T14:26:28.832+01:00][DEBUG][plugins.fleet] Agents upgrade details telemetry: [{"target_version":"8.13.0-SNAPSHOT","state":"UPG_FAILED","error_msg":"failed download of agent binary: unable to download package: 3 errors occurred:\n\t* package '/Library/Elastic/Agent/data/elastic-agent-f383c6/downloads/elastic-agent-8.13.0-SNAPSHOT-darwin-aarch64.tar.gz' not found: open /Library/Elastic/Agent/data/elastic-agent-f383c6/downloads/elastic-agent-8.13.0-SNAPSHOT-darwin-aarch64.tar.gz: no such file or directory\n\t* call to 'https://artifacts.elastic.co/downloads/dummy/beats/elastic-agent/elastic-agent-8.13.0-SNAPSHOT-darwin-aarch64.tar.gz' returned unsuccessful status code: 404\n\t* call to 'https://artifacts.elastic.co/downloads/dummy/beats/elastic-agent/elastic-agent-8.13.0-SNAPSHOT-darwin-aarch64.tar.gz' returned unsuccessful status code: 404\n\n"}]
```

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
@juliaElastic juliaElastic self-assigned this Dec 18, 2023
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Dec 18, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@juliaElastic juliaElastic changed the title [Fleet] Upgrade details telemetry (#173356) [8.12][Fleet] Upgrade details telemetry (#173356) Dec 18, 2023
@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@juliaElastic juliaElastic requested a review from a team December 18, 2023 11:31
@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Explore - Security Solution Cypress Tests #4 / url state sets and reads the url state for timeline by id sets and reads the url state for timeline by id

Metrics [docs]

✅ unchanged

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

@juliaElastic juliaElastic merged commit 8904970 into elastic:8.12 Dec 18, 2023
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants