Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] flag package policy SO to trigger agent policy bump #200536

Merged
merged 20 commits into from
Nov 25, 2024

Conversation

juliaElastic
Copy link
Contributor

@juliaElastic juliaElastic commented Nov 18, 2024

Summary

Closes #193352

Update:

Using a new SO field bump_agent_policy_revision in package policy type to mark package policies for update, this will trigger an agent policy revision bump.

The feature supports both legacy and new package policy SO types, and queries policies from all spaces.

To test, add a model version change to the package policy type and save. After Fleet setup is run, the agent policies using the package policies should be bumped and deployed.
The same effect can be achieved by manually updating a package policy SO and loading Fleet UI to trigger setup.

        '2': {
          changes: [
            {
              type: 'data_backfill',
              backfillFn: (doc) => {
                return { attributes: { ...doc.attributes, bump_agent_policy_revision: true } };
              },
            },
          ],
        },

  curl -sk -XPOST --user fleet_superuser:password -H 'content-type:application/json' \     -H'x-elastic-product-origin:fleet' \
     http://localhost:9200/.kibana_ingest/_update_by_query -d '
     { "query": {
      "match": {
        "type": "fleet-package-policies"
      }
    },"script": {
      "source": "ctx._source[\"fleet-package-policies\"].bump_agent_policy_revision = true",
      "lang": "painless"
    }
  }'

[2024-11-20T14:40:30.064+01:00][INFO ][plugins.fleet] Found 1 package policies that need agent policy revision bump
[2024-11-20T14:40:31.933+01:00][DEBUG][plugins.fleet] Updated 1 package policies in space space1 in 1869ms, bump 1 agent policies
[2024-11-20T14:40:35.056+01:00][DEBUG][plugins.fleet] Deploying 1 policies
[2024-11-20T14:40:35.493+01:00][DEBUG][plugins.fleet] Deploying policies: 7f108cf2-4cf0-4a11-8df4-fc69d00a3484:10

TODO:

  • the same flag has to be added on agent policy and output types, and the task extended to update them
    • I plan to do this in another pr, so that this doesn't become too big
  • add integration test if possible

Scale testing

Tested with 500 agent policies split to 2 spaces, 1 integration per policy and bumping the flag in a new saved object model version, the bump task took about 6s.
The deploy policies step is async, took about 30s.

[2024-11-20T15:53:55.628+01:00][INFO ][plugins.fleet] Found 501 package policies that need agent policy revision bump
[2024-11-20T15:53:57.881+01:00][DEBUG][plugins.fleet] Updated 250 package policies in space space1 in 2253ms, bump 250 agent policies
[2024-11-20T15:53:59.926+01:00][DEBUG][plugins.fleet] Updated 251 package policies in space default in 4298ms, bump 251 agent policies
[2024-11-20T15:54:01.186+01:00][DEBUG][plugins.fleet] Deploying 250 policies

[2024-11-20T15:54:29.989+01:00][DEBUG][plugins.fleet] Deploying policies: test-policy-space1-1:4, ...
[2024-11-20T15:54:33.538+01:00][DEBUG][plugins.fleet] Deploying policies: policy-elastic-agent-on-cloud:4, test-policy-default-1:4, ...

Checklist

@juliaElastic juliaElastic self-assigned this Nov 18, 2024
@nchaulet
Copy link
Member

nchaulet commented Nov 18, 2024

@juliaElastic I am wondering if this could introduce a new bug, with infinite bump loop (we had similar issues with preconfiguration where we some times miss things during comparaison), I am wondering if we have a mechanism where we explicitly tell we want a policy bump during the migration maybe a new property schedule_bump_policy_next_task, and in your task you retrieve all package policy with that field and bump the related agent policy if there is one. What do you think of that approach?

@juliaElastic
Copy link
Contributor Author

I am wondering if this could introduce a new bug, with infinite bump loop (we had similar issues with preconfiguration where we some times miss things during comparaison), I am wondering if we have a mechanism where we explicitly tell we want a policy bump during the migration maybe a new property schedule_bump_policy_next_task, and in your task you retrieve all package policy with that field and bump the related agent policy if there is one. What do you think of that approach?

Thanks for the suggestion, I think it makes sense to add an explicit flag to avoid accidentally triggering updates in an infinite loop.
Updated the main logic, will do more testing and extend it to agent policy/output too.

}

export async function _updatePackagePoliciesThatNeedBump(logger: Logger) {
// TODO spaces?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need to use the SO for the correct space here

appContextService.getInternalUserESClient(),
packagePoliciesToBump.items.map((item) => ({
...item,
bump_agent_policy_revision: false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag has to be set to false, otherwise an update will happen on every Fleet setup.
This update triggers the agent policy bump anyway, so no need to bump separately.

@juliaElastic juliaElastic changed the title [Fleet] bump policy if SO different from full policy [Fleet] flag package policy SO to trigger agent policy bump Nov 18, 2024
@juliaElastic
Copy link
Contributor Author

juliaElastic commented Nov 19, 2024

@nchaulet While testing with spaces, I have something to confirm. Do we need to support both ingest-package-policies and fleet-package-policies SO types, meaning, do we have to add any mapping changes to both and bump policies stored in both types?
I found this explanation so it seems we have to support both: https://github.com/elastic/kibana/blob/3757e641278a5186919e35a0f980ac3cda7e8ccd/x-pack/plugins/fleet/dev_docs/space_awareness.md#space-aware-entities-in-fleet

Also noticed that locally this API seems to throw an error:

curl -u elastic:changeme -XPOST "http://localhost:5601/julia/internal/fleet/enable_space_awareness" -H "kbn-xsrf: reporting" -H 'elastic-api-version: 1'

{"statusCode":400,"error":"Bad Request","message":"uri [/internal/fleet/enable_space_awareness] with method [post] exists but is not available with the current configuration"}%    

@nchaulet
Copy link
Member

Yes we need to support both saved object, as the feature will be opt-in for users.

You should be able to trigger the migration with this call (internal call need a x-elastic-internal-origin flag now)

curl -u elastic:changeme -XPOST "http://localhost:5601/internal/fleet/enable_space_awareness" -H "kbn-xsrf: reporting" -H 'elastic-api-version: 1'  -H 'x-elastic-internal-origin: 1'

}

async function getPackagePoliciesToBump() {
return await packagePolicyService.list(appContextService.getInternalUserSOClient(), {
Copy link
Contributor Author

@juliaElastic juliaElastic Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to query package policies from all spaces, we need to query with soClient for each space. For this, we need to query all spaces first.

I think similarly the deploy policies task doesn't work correctly, because the logic only queries agent policies from the default space: https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/services/setup/fleet_server_policies_enrollment_keys.ts#L35

I'll query from all spaces like this:

.getInternalUserSOClientWithoutSpaceExtension()
.find<AgentPolicySOAttributes>({
type: savedObjectType,
fields: ['revision', 'data_output_id', 'monitoring_output_id'],
searchFields: ['data_output_id', 'monitoring_output_id'],
search: escapeSearchQueryPhrase(outputId),
perPage: SO_SEARCH_LIMIT,
namespaces: ['*'],

{ id, version, attributes }: SavedObject<PackagePolicySOAttributes>,
namespaces?: string[]
): PackagePolicy => {
const { bump_agent_policy_revision: bumpAgentPolicyRevision, ...restAttributes } = attributes;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump_agent_policy_revision is only added to the SO type, it is removed from PackagePolicy to avoid it leaking out in the API responses.

@juliaElastic juliaElastic marked this pull request as ready for review November 20, 2024 15:07
@juliaElastic juliaElastic requested review from a team as code owners November 20, 2024 15:07
@juliaElastic juliaElastic requested review from a team as code owners November 20, 2024 15:07
@juliaElastic juliaElastic added release_note:skip Skip the PR/issue when compiling release notes backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) labels Nov 20, 2024
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 21, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

Copy link
Contributor

@elena-shostak elena-shostak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fleet SO update LGTM

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM 🚀

Copy link
Contributor

@gsoldevila gsoldevila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! only additive (non-breaking) changes in the mappings

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm requesting a change to the way the task cancellation is working, explained in a comment.

}

await runWithCache(async () => {
await _updatePackagePoliciesThatNeedBump(appContextService.getLogger(), cancelled);
Copy link
Member

@pmuellr pmuellr Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to work, to handle being cancelled while it's running. cancelled will always be false here.

Here's an example of handling this correctly; continue to capture it locally, but provide a function that to test it, and pass the function to the "inner" functions rather than the value:

({ taskInstance }: { taskInstance: ConcreteTaskInstance }) => {
let cancelled = false;
const isCancelled = () => cancelled;
return {
run: async () =>
runTask({
getRiskScoreService,
isCancelled,
logger,
taskInstance,
telemetry,
entityAnalyticsConfig,
}),
cancel: async () => {
cancelled = true;
},
};

const start = Date.now();

for (const [spaceId, packagePolicies] of Object.entries(packagePoliciesIndexedBySpace)) {
if (cancelled) {
Copy link
Member

@pmuellr pmuellr Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where you'd check isCancelled() vs the boolean value. And note that this is a good sort of place to put this, in case the array being processed ends up being extremely large.

@juliaElastic juliaElastic requested a review from pmuellr November 22, 2024 14:39
Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResponseOps changes LGTM; thx for making the change to the task cancellation!

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Jest Tests #1 / Category can submit without setting a category

Metrics [docs]

✅ unchanged

History

cc @juliaElastic

@juliaElastic juliaElastic merged commit 973c695 into elastic:main Nov 25, 2024
26 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/12007554238

@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.x Backport failed because of merge conflicts

You might need to backport the following PRs to 8.x:
- OpenAPI docs for APM UI APIs (#197946)
- [Core] [UA] Support API Deprecations (#196081)

Manual backport

To create the backport manually run:

node scripts/backport --pr 200536

Questions ?

Please refer to the Backport tool documentation

juliaElastic added a commit to juliaElastic/kibana that referenced this pull request Nov 25, 2024
…200536)

Closes elastic#193352

Update:

Using a new SO field `bump_agent_policy_revision` in package policy type
to mark package policies for update, this will trigger an agent policy
revision bump.

The feature supports both legacy and new package policy SO types, and
queries policies from all spaces.

To test, add a model version change to the package policy type and save.
After Fleet setup is run, the agent policies using the package policies
should be bumped and deployed.
The same effect can be achieved by manually updating a package policy SO
and loading Fleet UI to trigger setup.
```
        '2': {
          changes: [
            {
              type: 'data_backfill',
              backfillFn: (doc) => {
                return { attributes: { ...doc.attributes, bump_agent_policy_revision: true } };
              },
            },
          ],
        },

  curl -sk -XPOST --user fleet_superuser:password -H 'content-type:application/json' \     -H'x-elastic-product-origin:fleet' \
     http://localhost:9200/.kibana_ingest/_update_by_query -d '
     { "query": {
      "match": {
        "type": "fleet-package-policies"
      }
    },"script": {
      "source": "ctx._source[\"fleet-package-policies\"].bump_agent_policy_revision = true",
      "lang": "painless"
    }
  }'

```

```
[2024-11-20T14:40:30.064+01:00][INFO ][plugins.fleet] Found 1 package policies that need agent policy revision bump
[2024-11-20T14:40:31.933+01:00][DEBUG][plugins.fleet] Updated 1 package policies in space space1 in 1869ms, bump 1 agent policies
[2024-11-20T14:40:35.056+01:00][DEBUG][plugins.fleet] Deploying 1 policies
[2024-11-20T14:40:35.493+01:00][DEBUG][plugins.fleet] Deploying policies: 7f108cf2-4cf0-4a11-8df4-fc69d00a3484:10
```

TODO:
- the same flag has to be added on agent policy and output types, and
the task extended to update them
  - I plan to do this in another pr, so that this doesn't become too big
- add integration test if possible

Tested with 500 agent policies split to 2 spaces, 1 integration per
policy and bumping the flag in a new saved object model version, the
bump task took about 6s.
The deploy policies step is async, took about 30s.
```
[2024-11-20T15:53:55.628+01:00][INFO ][plugins.fleet] Found 501 package policies that need agent policy revision bump
[2024-11-20T15:53:57.881+01:00][DEBUG][plugins.fleet] Updated 250 package policies in space space1 in 2253ms, bump 250 agent policies
[2024-11-20T15:53:59.926+01:00][DEBUG][plugins.fleet] Updated 251 package policies in space default in 4298ms, bump 251 agent policies
[2024-11-20T15:54:01.186+01:00][DEBUG][plugins.fleet] Deploying 250 policies

[2024-11-20T15:54:29.989+01:00][DEBUG][plugins.fleet] Deploying policies: test-policy-space1-1:4, ...
[2024-11-20T15:54:33.538+01:00][DEBUG][plugins.fleet] Deploying policies: policy-elastic-agent-on-cloud:4, test-policy-default-1:4, ...

```

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <[email protected]>
@kibanamachine kibanamachine mentioned this pull request Nov 25, 2024
2 tasks
@gergoabraham
Copy link
Contributor

just tested this locally, adding bump_agent_policy_revision: true to the new attributes in the backfill function successfully bumps revision number and re-deploys latest changes to agents with Defend integration 👍 thanks for the changes!

cc @dasansol92 @ferullo

juliaElastic added a commit that referenced this pull request Nov 26, 2024
paulinashakirova pushed a commit to paulinashakirova/kibana that referenced this pull request Nov 26, 2024
…200536)

## Summary

Closes elastic#193352

Update:

Using a new SO field `bump_agent_policy_revision` in package policy type
to mark package policies for update, this will trigger an agent policy
revision bump.

The feature supports both legacy and new package policy SO types, and
queries policies from all spaces.

To test, add a model version change to the package policy type and save.
After Fleet setup is run, the agent policies using the package policies
should be bumped and deployed.
The same effect can be achieved by manually updating a package policy SO
and loading Fleet UI to trigger setup.
```
        '2': {
          changes: [
            {
              type: 'data_backfill',
              backfillFn: (doc) => {
                return { attributes: { ...doc.attributes, bump_agent_policy_revision: true } };
              },
            },
          ],
        },

  curl -sk -XPOST --user fleet_superuser:password -H 'content-type:application/json' \     -H'x-elastic-product-origin:fleet' \
     http://localhost:9200/.kibana_ingest/_update_by_query -d '
     { "query": {
      "match": {
        "type": "fleet-package-policies"
      }
    },"script": {
      "source": "ctx._source[\"fleet-package-policies\"].bump_agent_policy_revision = true",
      "lang": "painless"
    }
  }'

```

```
[2024-11-20T14:40:30.064+01:00][INFO ][plugins.fleet] Found 1 package policies that need agent policy revision bump
[2024-11-20T14:40:31.933+01:00][DEBUG][plugins.fleet] Updated 1 package policies in space space1 in 1869ms, bump 1 agent policies
[2024-11-20T14:40:35.056+01:00][DEBUG][plugins.fleet] Deploying 1 policies
[2024-11-20T14:40:35.493+01:00][DEBUG][plugins.fleet] Deploying policies: 7f108cf2-4cf0-4a11-8df4-fc69d00a3484:10
```

TODO:
- the same flag has to be added on agent policy and output types, and
the task extended to update them
  - I plan to do this in another pr, so that this doesn't become too big
- add integration test if possible

### Scale testing
Tested with 500 agent policies split to 2 spaces, 1 integration per
policy and bumping the flag in a new saved object model version, the
bump task took about 6s.
The deploy policies step is async, took about 30s.
```
[2024-11-20T15:53:55.628+01:00][INFO ][plugins.fleet] Found 501 package policies that need agent policy revision bump
[2024-11-20T15:53:57.881+01:00][DEBUG][plugins.fleet] Updated 250 package policies in space space1 in 2253ms, bump 250 agent policies
[2024-11-20T15:53:59.926+01:00][DEBUG][plugins.fleet] Updated 251 package policies in space default in 4298ms, bump 251 agent policies
[2024-11-20T15:54:01.186+01:00][DEBUG][plugins.fleet] Deploying 250 policies

[2024-11-20T15:54:29.989+01:00][DEBUG][plugins.fleet] Deploying policies: test-policy-space1-1:4, ...
[2024-11-20T15:54:33.538+01:00][DEBUG][plugins.fleet] Deploying policies: policy-elastic-agent-on-cloud:4, test-policy-default-1:4, ...

```

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <[email protected]>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Dec 12, 2024
…200536)

## Summary

Closes elastic#193352

Update:

Using a new SO field `bump_agent_policy_revision` in package policy type
to mark package policies for update, this will trigger an agent policy
revision bump.

The feature supports both legacy and new package policy SO types, and
queries policies from all spaces.

To test, add a model version change to the package policy type and save.
After Fleet setup is run, the agent policies using the package policies
should be bumped and deployed.
The same effect can be achieved by manually updating a package policy SO
and loading Fleet UI to trigger setup.
```
        '2': {
          changes: [
            {
              type: 'data_backfill',
              backfillFn: (doc) => {
                return { attributes: { ...doc.attributes, bump_agent_policy_revision: true } };
              },
            },
          ],
        },

  curl -sk -XPOST --user fleet_superuser:password -H 'content-type:application/json' \     -H'x-elastic-product-origin:fleet' \
     http://localhost:9200/.kibana_ingest/_update_by_query -d '
     { "query": {
      "match": {
        "type": "fleet-package-policies"
      }
    },"script": {
      "source": "ctx._source[\"fleet-package-policies\"].bump_agent_policy_revision = true",
      "lang": "painless"
    }
  }'

```

```
[2024-11-20T14:40:30.064+01:00][INFO ][plugins.fleet] Found 1 package policies that need agent policy revision bump
[2024-11-20T14:40:31.933+01:00][DEBUG][plugins.fleet] Updated 1 package policies in space space1 in 1869ms, bump 1 agent policies
[2024-11-20T14:40:35.056+01:00][DEBUG][plugins.fleet] Deploying 1 policies
[2024-11-20T14:40:35.493+01:00][DEBUG][plugins.fleet] Deploying policies: 7f108cf2-4cf0-4a11-8df4-fc69d00a3484:10
```

TODO:
- the same flag has to be added on agent policy and output types, and
the task extended to update them
  - I plan to do this in another pr, so that this doesn't become too big
- add integration test if possible

### Scale testing
Tested with 500 agent policies split to 2 spaces, 1 integration per
policy and bumping the flag in a new saved object model version, the
bump task took about 6s.
The deploy policies step is async, took about 30s.
```
[2024-11-20T15:53:55.628+01:00][INFO ][plugins.fleet] Found 501 package policies that need agent policy revision bump
[2024-11-20T15:53:57.881+01:00][DEBUG][plugins.fleet] Updated 250 package policies in space space1 in 2253ms, bump 250 agent policies
[2024-11-20T15:53:59.926+01:00][DEBUG][plugins.fleet] Updated 251 package policies in space default in 4298ms, bump 251 agent policies
[2024-11-20T15:54:01.186+01:00][DEBUG][plugins.fleet] Deploying 250 policies

[2024-11-20T15:54:29.989+01:00][DEBUG][plugins.fleet] Deploying policies: test-policy-space1-1:4, ...
[2024-11-20T15:54:33.538+01:00][DEBUG][plugins.fleet] Deploying policies: policy-elastic-agent-on-cloud:4, test-policy-default-1:4, ...

```

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet] Migration of saved objects do not trigger a policy update
9 participants