Skip to content

Commit

Permalink
[Fleet] Cap setup attempts to 50 on Serverless (elastic#171550)
Browse files Browse the repository at this point in the history
## Summary

If there is a bug in Fleet setup, we can retrigger rollovers on each
attempt, causing shard explosion in Elasticsearch. We need a sane limit
to prevent this from happening, which this PR introduces.

The impact of Fleet setup failing to complete (Agents may not be able to
be enrolled) is much smaller than causing shard explosion, so this seems
like an acceptable tradeoff.

This is only a small part of the overall solution - in the current
incident we have, it's still unclear why we are failing to rollover the
index and getting into this loop.

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)
  • Loading branch information
joshdover authored Nov 20, 2023
1 parent d412e57 commit 8052f03
Showing 1 changed file with 10 additions and 6 deletions.
16 changes: 10 additions & 6 deletions x-pack/plugins/fleet/server/plugin.ts
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,9 @@ export class FleetPlugin

this.policyWatcher.start(licenseService);

// We only retry when this feature flag is enabled (Serverless)
const setupAttempts = this.configInitialValue.internal?.retrySetupOnBoot ? 25 : 1;

const fleetSetupPromise = (async () => {
try {
// Fleet remains `available` during setup as to excessively delay Kibana's boot process.
Expand Down Expand Up @@ -555,18 +558,17 @@ export class FleetPlugin
);
},
{
// We only retry when this feature flag is enabled
numOfAttempts: this.configInitialValue.internal?.retrySetupOnBoot ? Infinity : 1,
// 250ms initial backoff
startingDelay: 250,
numOfAttempts: setupAttempts,
// 1s initial backoff
startingDelay: 1000,
// 5m max backoff
maxDelay: 60000 * 5,
timeMultiple: 2,
// avoid HA contention with other Kibana instances
jitter: 'full',
retry: (error: any, attemptCount: number) => {
const summary = `Fleet setup attempt ${attemptCount} failed, will retry after backoff`;
logger.debug(summary, { error: { message: error } });
logger.warn(summary, { error: { message: error } });

this.fleetStatus$.next({
level: ServiceStatusLevels.available,
Expand All @@ -586,7 +588,9 @@ export class FleetPlugin
summary: 'Fleet is available',
});
} catch (error) {
logger.warn('Fleet setup failed', { error: { message: error } });
logger.warn(`Fleet setup failed after ${setupAttempts} attempts`, {
error: { message: error },
});

this.fleetStatus$.next({
// As long as Fleet has a dependency on EPR, we can't reliably set Kibana status to `unavailable` here.
Expand Down

0 comments on commit 8052f03

Please sign in to comment.