Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Fix inability to upgrade agents from 8.10.4 -> 8.11 #170974

Merged
merged 8 commits into from
Nov 10, 2023

Conversation

kpollich
Copy link
Member

@kpollich kpollich commented Nov 9, 2023

Summary

Closes #169825

This PR adds logic to Fleet's /api/agents/available_versions endpoint that will ensure we periodically try to fetch from the live product versions API at https://www.elastic.co/api/product_versions to make sure we have eventual consistency in the list of available agent versions.

Currently, Kibana relies entirely on a static file generated at build time from the above API. If the API isn't up-to-date with the latest agent version (e.g. kibana completed its build before agent), then that build of Kibana will never "see" the corresponding build of agent.

This API endpoint is cached for two hours to prevent overfetching from this external API, and from constantly going out to disk to read from the agent versions file.

To do

  • Update unit tests
  • Consider airgapped environments

On airgapped environments

In airgapped environments, we're going to try and fetch from the product_versions API and that request is going to fail. What we've seen happen in some environments is that these requests do not "fail fast" and instead wait until a network timeout is reached.

I'd love to avoid that timeout case and somehow detect airgapped environments and avoid calling this API at all. However, we don't have a great deterministic way to know if someone is in an airgapped environment. The best guess I think we can make is by checking whether xpack.fleet.registryUrl is set to something other than https://epr.elastic.co. Curious if anyone has thoughts on this.

Screenshots

image

image

image

To test

  1. Set up Fleet Server + ES + Kibana
  2. Spin up a Fleet Server running Agent v8.11.0
  3. Enroll an agent running v8.10.4 (I used multipass)
  4. Verify the agent can be upgraded from the UI

@kpollich kpollich added release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) v8.12.0 v8.11.1 labels Nov 9, 2023
@kpollich kpollich self-assigned this Nov 9, 2023
@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@kpollich kpollich marked this pull request as ready for review November 9, 2023 17:58
@kpollich kpollich requested a review from a team as a code owner November 9, 2023 17:58
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@kpollich
Copy link
Member Author

kpollich commented Nov 9, 2023

@elasticmachine merge upstream

@nchaulet nchaulet self-requested a review November 9, 2023 18:36
@nchaulet
Copy link
Member

nchaulet commented Nov 9, 2023

even checking whether xpack.fleet.registryUrl does not mean it's air gapped, I would rather use an explicit settings for that (it could go in our internal.* if we do not want to expose it to user), if for any reason this goes bad it will allow us to desactivate that feature from config too

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM 🚀

@kpollich
Copy link
Member Author

even checking whether xpack.fleet.registryUrl does not mean it's air gapped, I would rather use an explicit settings for that (it could go in our internal.* if we do not want to expose it to user), if for any reason this goes bad it will allow us to desactivate that feature from config too

Yeah I think we might want to consider introducing an explicit airgapped setting to help us "fail fast" on various network calls around the app. I can look for an issue or create one to capture that particular idea, but it's not something we need to solve in this PR.

@kpollich kpollich enabled auto-merge (squash) November 10, 2023 13:28
@kpollich
Copy link
Member Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #77 / transform basic license transform - creation - runtime mappings & saved search creation with runtime mappings batch transform with unique rt_airline_lower and sort by time and runtime mappings runs the transform and displays it correctly in Discover page

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @kpollich

@kpollich kpollich merged commit cd909f0 into elastic:main Nov 10, 2023
@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.11 Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 170974

Questions ?

Please refer to the Backport tool documentation

@kpollich
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.11

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kpollich added a commit to kpollich/kibana that referenced this pull request Nov 10, 2023
…170974)

## Summary

Closes elastic#169825

This PR adds logic to Fleet's `/api/agents/available_versions` endpoint
that will ensure we periodically try to fetch from the live product
versions API at https://www.elastic.co/api/product_versions to make sure
we have eventual consistency in the list of available agent versions.

Currently, Kibana relies entirely on a static file generated at build
time from the above API. If the API isn't up-to-date with the latest
agent version (e.g. kibana completed its build before agent), then that
build of Kibana will never "see" the corresponding build of agent.

This API endpoint is cached for two hours to prevent overfetching from
this external API, and from constantly going out to disk to read from
the agent versions file.

## To do
- [x] Update unit tests
- [x] Consider airgapped environments

## On airgapped environments

In airgapped environments, we're going to try and fetch from the
`product_versions` API and that request is going to fail. What we've
seen happen in some environments is that these requests do not "fail
fast" and instead wait until a network timeout is reached.

I'd love to avoid that timeout case and somehow detect airgapped
environments and avoid calling this API at all. However, we don't have a
great deterministic way to know if someone is in an airgapped
environment. The best guess I think we can make is by checking whether
`xpack.fleet.registryUrl` is set to something other than
`https://epr.elastic.co`. Curious if anyone has thoughts on this.

## Screenshots

![image](https://github.com/elastic/kibana/assets/6766512/0906817c-0098-4b67-8791-d06730f450f6)

![image](https://github.com/elastic/kibana/assets/6766512/59e7c132-f568-470f-b48d-53761ddc2fde)

![image](https://github.com/elastic/kibana/assets/6766512/986372df-a90f-48c3-ae24-c3012e8f7730)

## To test

1. Set up Fleet Server + ES + Kibana
2. Spin up a Fleet Server running Agent v8.11.0
3. Enroll an agent running v8.10.4 (I used multipass)
4. Verify the agent can be upgraded from the UI

---------

Co-authored-by: Kibana Machine <[email protected]>
(cherry picked from commit cd909f0)

# Conflicts:
#	x-pack/plugins/fleet/server/services/agents/versions.ts
kpollich added a commit that referenced this pull request Nov 12, 2023
…170974) (#171039)

# Backport

This will backport the following commits from `main` to `8.11`:
- [[Fleet] Fix inability to upgrade agents from 8.10.4 -> 8.11
(#170974)](#170974)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Kyle
Pollich","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-11-10T16:08:09Z","message":"[Fleet]
Fix inability to upgrade agents from 8.10.4 -> 8.11 (#170974)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/169825\r\n\r\nThis PR adds
logic to Fleet's `/api/agents/available_versions` endpoint\r\nthat will
ensure we periodically try to fetch from the live product\r\nversions
API at https://www.elastic.co/api/product_versions to make sure\r\nwe
have eventual consistency in the list of available agent
versions.\r\n\r\nCurrently, Kibana relies entirely on a static file
generated at build\r\ntime from the above API. If the API isn't
up-to-date with the latest\r\nagent version (e.g. kibana completed its
build before agent), then that\r\nbuild of Kibana will never \"see\" the
corresponding build of agent.\r\n\r\nThis API endpoint is cached for two
hours to prevent overfetching from\r\nthis external API, and from
constantly going out to disk to read from\r\nthe agent versions
file.\r\n\r\n## To do\r\n- [x] Update unit tests\r\n- [x] Consider
airgapped environments\r\n\r\n## On airgapped environments\r\n\r\nIn
airgapped environments, we're going to try and fetch from
the\r\n`product_versions` API and that request is going to fail. What
we've\r\nseen happen in some environments is that these requests do not
\"fail\r\nfast\" and instead wait until a network timeout is
reached.\r\n\r\nI'd love to avoid that timeout case and somehow detect
airgapped\r\nenvironments and avoid calling this API at all. However, we
don't have a\r\ngreat deterministic way to know if someone is in an
airgapped\r\nenvironment. The best guess I think we can make is by
checking whether\r\n`xpack.fleet.registryUrl` is set to something other
than\r\n`https://epr.elastic.co`. Curious if anyone has thoughts on
this.\r\n\r\n##
Screenshots\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/6766512/0906817c-0098-4b67-8791-d06730f450f6)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/6766512/59e7c132-f568-470f-b48d-53761ddc2fde)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/6766512/986372df-a90f-48c3-ae24-c3012e8f7730)\r\n\r\n##
To test\r\n\r\n1. Set up Fleet Server + ES + Kibana\r\n2. Spin up a
Fleet Server running Agent v8.11.0\r\n3. Enroll an agent running v8.10.4
(I used multipass)\r\n4. Verify the agent can be upgraded from the
UI\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"cd909f03b1d71da93041a0b5c184243aa6506dea","branchLabelMapping":{"^v8.12.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:Fleet","backport:prev-minor","v8.12.0","v8.11.1"],"number":170974,"url":"https://github.com/elastic/kibana/pull/170974","mergeCommit":{"message":"[Fleet]
Fix inability to upgrade agents from 8.10.4 -> 8.11 (#170974)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/169825\r\n\r\nThis PR adds
logic to Fleet's `/api/agents/available_versions` endpoint\r\nthat will
ensure we periodically try to fetch from the live product\r\nversions
API at https://www.elastic.co/api/product_versions to make sure\r\nwe
have eventual consistency in the list of available agent
versions.\r\n\r\nCurrently, Kibana relies entirely on a static file
generated at build\r\ntime from the above API. If the API isn't
up-to-date with the latest\r\nagent version (e.g. kibana completed its
build before agent), then that\r\nbuild of Kibana will never \"see\" the
corresponding build of agent.\r\n\r\nThis API endpoint is cached for two
hours to prevent overfetching from\r\nthis external API, and from
constantly going out to disk to read from\r\nthe agent versions
file.\r\n\r\n## To do\r\n- [x] Update unit tests\r\n- [x] Consider
airgapped environments\r\n\r\n## On airgapped environments\r\n\r\nIn
airgapped environments, we're going to try and fetch from
the\r\n`product_versions` API and that request is going to fail. What
we've\r\nseen happen in some environments is that these requests do not
\"fail\r\nfast\" and instead wait until a network timeout is
reached.\r\n\r\nI'd love to avoid that timeout case and somehow detect
airgapped\r\nenvironments and avoid calling this API at all. However, we
don't have a\r\ngreat deterministic way to know if someone is in an
airgapped\r\nenvironment. The best guess I think we can make is by
checking whether\r\n`xpack.fleet.registryUrl` is set to something other
than\r\n`https://epr.elastic.co`. Curious if anyone has thoughts on
this.\r\n\r\n##
Screenshots\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/6766512/0906817c-0098-4b67-8791-d06730f450f6)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/6766512/59e7c132-f568-470f-b48d-53761ddc2fde)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/6766512/986372df-a90f-48c3-ae24-c3012e8f7730)\r\n\r\n##
To test\r\n\r\n1. Set up Fleet Server + ES + Kibana\r\n2. Spin up a
Fleet Server running Agent v8.11.0\r\n3. Enroll an agent running v8.10.4
(I used multipass)\r\n4. Verify the agent can be upgraded from the
UI\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"cd909f03b1d71da93041a0b5c184243aa6506dea"}},"sourceBranch":"main","suggestedTargetBranches":["8.11"],"targetPullRequestStates":[{"branch":"main","label":"v8.12.0","labelRegex":"^v8.12.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/170974","number":170974,"mergeCommit":{"message":"[Fleet]
Fix inability to upgrade agents from 8.10.4 -> 8.11 (#170974)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/169825\r\n\r\nThis PR adds
logic to Fleet's `/api/agents/available_versions` endpoint\r\nthat will
ensure we periodically try to fetch from the live product\r\nversions
API at https://www.elastic.co/api/product_versions to make sure\r\nwe
have eventual consistency in the list of available agent
versions.\r\n\r\nCurrently, Kibana relies entirely on a static file
generated at build\r\ntime from the above API. If the API isn't
up-to-date with the latest\r\nagent version (e.g. kibana completed its
build before agent), then that\r\nbuild of Kibana will never \"see\" the
corresponding build of agent.\r\n\r\nThis API endpoint is cached for two
hours to prevent overfetching from\r\nthis external API, and from
constantly going out to disk to read from\r\nthe agent versions
file.\r\n\r\n## To do\r\n- [x] Update unit tests\r\n- [x] Consider
airgapped environments\r\n\r\n## On airgapped environments\r\n\r\nIn
airgapped environments, we're going to try and fetch from
the\r\n`product_versions` API and that request is going to fail. What
we've\r\nseen happen in some environments is that these requests do not
\"fail\r\nfast\" and instead wait until a network timeout is
reached.\r\n\r\nI'd love to avoid that timeout case and somehow detect
airgapped\r\nenvironments and avoid calling this API at all. However, we
don't have a\r\ngreat deterministic way to know if someone is in an
airgapped\r\nenvironment. The best guess I think we can make is by
checking whether\r\n`xpack.fleet.registryUrl` is set to something other
than\r\n`https://epr.elastic.co`. Curious if anyone has thoughts on
this.\r\n\r\n##
Screenshots\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/6766512/0906817c-0098-4b67-8791-d06730f450f6)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/6766512/59e7c132-f568-470f-b48d-53761ddc2fde)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/6766512/986372df-a90f-48c3-ae24-c3012e8f7730)\r\n\r\n##
To test\r\n\r\n1. Set up Fleet Server + ES + Kibana\r\n2. Spin up a
Fleet Server running Agent v8.11.0\r\n3. Enroll an agent running v8.10.4
(I used multipass)\r\n4. Verify the agent can be upgraded from the
UI\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"cd909f03b1d71da93041a0b5c184243aa6506dea"}},{"branch":"8.11","label":"v8.11.1","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Kibana Machine <[email protected]>
@mistic mistic added v8.11.2 and removed v8.11.1 labels Nov 14, 2023
@mistic
Copy link
Member

mistic commented Nov 14, 2023

This PR haven't made it into the latest BC of 8.11.1. Updating the labels.

kilfoyle pushed a commit that referenced this pull request Nov 14, 2023
## Summary

The 8.11.1 release notes included #170974 which didn't actually land in
8.11.1. We shipped BC2 of 8.11.1 which was built from this Kibana
commit:
https://github.com/elastic/kibana/commits/09feaf416f986b239b8e8ad95ecdda0f9d56ebec.
The PR was not merged until after this commit, so the bug is still
present (though [mitigated
slightly](#169825 (comment)))
in 8.11.1.

This PR removes the erroneous release note from the 8.11.1 release
notes. How can we make sure the fix _does_ get included in the eventual
8.11.2 release notes?
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 14, 2023
…71200)

## Summary

The 8.11.1 release notes included elastic#170974 which didn't actually land in
8.11.1. We shipped BC2 of 8.11.1 which was built from this Kibana
commit:
https://github.com/elastic/kibana/commits/09feaf416f986b239b8e8ad95ecdda0f9d56ebec.
The PR was not merged until after this commit, so the bug is still
present (though [mitigated
slightly](elastic#169825 (comment)))
in 8.11.1.

This PR removes the erroneous release note from the 8.11.1 release
notes. How can we make sure the fix _does_ get included in the eventual
8.11.2 release notes?

(cherry picked from commit 480fcef)
kibanamachine added a commit that referenced this pull request Nov 14, 2023
…71200) (#171249)

# Backport

This will backport the following commits from `main` to `8.11`:
- [[Fleet] Remove agent upgrade fix from 8.11.1 release notes
(#171200)](#171200)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Kyle
Pollich","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-11-14T21:57:06Z","message":"[Fleet]
Remove agent upgrade fix from 8.11.1 release notes (#171200)\n\n##
Summary\r\n\r\nThe 8.11.1 release notes included #170974 which didn't
actually land in\r\n8.11.1. We shipped BC2 of 8.11.1 which was built
from this
Kibana\r\ncommit:\r\nhttps://github.com/elastic/kibana/commits/09feaf416f986b239b8e8ad95ecdda0f9d56ebec.\r\nThe
PR was not merged until after this commit, so the bug is
still\r\npresent (though
[mitigated\r\nslightly](https://github.com/elastic/kibana/issues/169825#issuecomment-1808453016))\r\nin
8.11.1.\r\n\r\nThis PR removes the erroneous release note from the
8.11.1 release\r\nnotes. How can we make sure the fix _does_ get
included in the eventual\r\n8.11.2 release
notes?","sha":"480fcef6985b21c1a3c22d4657aeefc761fec5a3","branchLabelMapping":{"^v8.12.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Docs","release_note:skip","backport:prev-minor","v8.12.0","v8.11.2"],"number":171200,"url":"https://github.com/elastic/kibana/pull/171200","mergeCommit":{"message":"[Fleet]
Remove agent upgrade fix from 8.11.1 release notes (#171200)\n\n##
Summary\r\n\r\nThe 8.11.1 release notes included #170974 which didn't
actually land in\r\n8.11.1. We shipped BC2 of 8.11.1 which was built
from this
Kibana\r\ncommit:\r\nhttps://github.com/elastic/kibana/commits/09feaf416f986b239b8e8ad95ecdda0f9d56ebec.\r\nThe
PR was not merged until after this commit, so the bug is
still\r\npresent (though
[mitigated\r\nslightly](https://github.com/elastic/kibana/issues/169825#issuecomment-1808453016))\r\nin
8.11.1.\r\n\r\nThis PR removes the erroneous release note from the
8.11.1 release\r\nnotes. How can we make sure the fix _does_ get
included in the eventual\r\n8.11.2 release
notes?","sha":"480fcef6985b21c1a3c22d4657aeefc761fec5a3"}},"sourceBranch":"main","suggestedTargetBranches":["8.11"],"targetPullRequestStates":[{"branch":"main","label":"v8.12.0","labelRegex":"^v8.12.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/171200","number":171200,"mergeCommit":{"message":"[Fleet]
Remove agent upgrade fix from 8.11.1 release notes (#171200)\n\n##
Summary\r\n\r\nThe 8.11.1 release notes included #170974 which didn't
actually land in\r\n8.11.1. We shipped BC2 of 8.11.1 which was built
from this
Kibana\r\ncommit:\r\nhttps://github.com/elastic/kibana/commits/09feaf416f986b239b8e8ad95ecdda0f9d56ebec.\r\nThe
PR was not merged until after this commit, so the bug is
still\r\npresent (though
[mitigated\r\nslightly](https://github.com/elastic/kibana/issues/169825#issuecomment-1808453016))\r\nin
8.11.1.\r\n\r\nThis PR removes the erroneous release note from the
8.11.1 release\r\nnotes. How can we make sure the fix _does_ get
included in the eventual\r\n8.11.2 release
notes?","sha":"480fcef6985b21c1a3c22d4657aeefc761fec5a3"}},{"branch":"8.11","label":"v8.11.2","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Kyle Pollich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team v8.11.2 v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet]: Unable to upgrade agents from 8.10.x to 8.11.0
7 participants