Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EDR Workflows][API] Gate Agent Tamper Protection setting on Agent Policy Settings #174400

Merged
merged 24 commits into from
Feb 1, 2024

Conversation

szwarckonrad
Copy link
Contributor

@szwarckonrad szwarckonrad commented Jan 5, 2024

This PR is part of an effort to limit EDR Workflow features to the Endpoint Complete tier on serverless and focuses on server skde part of gating Agent Tamper Protection.

Related PRs:

#174278
#175129

We decided to stick with the existing Fleet privileges for this component, and no extra changes are needed RBAC wise (confirmed with @roxana-gheorghe).

Plugin/Policy Watcher Changes:
To monitor agent policies for a downgrade in tier (from complete to essentials) and disable agent protections if enabled, the following steps have been taken:

  1. A new app feature, endpoint_agent_tamper_protection, has been introduced and linked to the endpoint:complete tier.
  2. An additional method, bumpRevision, has been exposed in the fleet's agent policy service. This method utilizes the service's internal update function and includes a disable_protection flag, allowing it to be used without further modifications.
  3. The security solution side calls this method upon successful fleet plugin setup. If the endpoint_agent_tamper_protection app feature is not enabled, it retrieves all agent policies with is_protected: true and updates these policies with is_protected: false.

API Changes:
To respond to attempts to activate agent protection via the API by users on the Essentials tier, the following steps have been taken:

  1. External callback functionality has been added to the agentPolicy service, following the implementation in packagePolicy.
  2. Update and create agent policy callbacks have been registered in the security solution. These callbacks check for the enabled status of the endpoint_agent_tamper_protection app feature. If disabled, the callback throws an error.
  3. External callback execution has been added to the update and create methods in agent policy route handlers.

@szwarckonrad szwarckonrad added release_note:skip Skip the PR/issue when compiling release notes Team:Defend Workflows “EDR Workflows” sub-team of Security Solution labels Jan 5, 2024
@szwarckonrad szwarckonrad self-assigned this Jan 5, 2024
@szwarckonrad
Copy link
Contributor Author

/ci

@szwarckonrad
Copy link
Contributor Author

/ci

@szwarckonrad
Copy link
Contributor Author

/ci

@szwarckonrad
Copy link
Contributor Author

/ci

@szwarckonrad
Copy link
Contributor Author

/ci

@@ -254,7 +294,7 @@ class AgentPolicyService {
id: options.id,
savedObjectType: AGENT_POLICY_SAVED_OBJECT_TYPE,
});

await this.runExternalCallbacks('agentPolicyCreate', agentPolicy);
this.checkTamperProtectionLicense(agentPolicy);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be moved to sec sol side to an external callback

Comment on lines 564 to 565
this.checkTamperProtectionLicense(agentPolicy);
await this.checkForValidUninstallToken(agentPolicy, id);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be moved to sec sol side to an external callback

@szwarckonrad szwarckonrad marked this pull request as ready for review January 26, 2024 10:35
@szwarckonrad szwarckonrad requested review from a team as code owners January 26, 2024 10:35
Copy link
Contributor

@semd semd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AppFeature config LGTM

Copy link
Contributor

@paul-tavares paul-tavares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not run it locally, but went through the code and left feedback.

Also - could you add cypress tests on our end that can validate the API responds as appropriate for the different PLIs?

| ExternalCallbackAgentPolicyCreate
| ExternalCallbackAgentPolicyUpdate;

export type ExternalCallbackType = ExternalCallback | ExternalCallbackAgentPolicy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create this? why not just add The two new types directly to the ExternalCallback type, which is already a union of all available external callbacks that can be registered.

Ultimately, I'll defer to the Fleet team on this, but this (IMO 😄 ) feels unnecessary and will just cause more union to be used in other places of the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent question! Removed as it was a leftover from poc phase I did ;). Thanks!

@@ -218,7 +220,7 @@ export interface FleetStartContract {
* Register callbacks for inclusion in fleet API processing
* @param args
*/
registerExternalCallback: (...args: ExternalCallback) => void;
registerExternalCallback: (...args: ExternalCallback | ExternalCallbackAgentPolicy) => void;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment further below. This union can be avoided by just adding the new types to the existing ExternalCallback type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

}

log.info(
`App feature [${AppFeatureSecurityKey.endpointAgentTamperProtection}] is disabled. Checking endpoint integration policies for compliance`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this message accurate? I think it should say "Checking fleet agent policies" not endpoint integrations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, changed.

Comment on lines +67 to +76
await pMap(updates, async (update) => {
try {
return await agentPolicyService.bumpRevision(internalSoClient, esClient, update.id, {
user: { username: 'elastic' } as AuthenticatedUser,
removeProtection: true,
});
} catch (error) {
policyUpdateErrors.push({ error, id: update.id });
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not very efficient for an env. having large amount of policies. Can we add a method to the agentPolicyService that can do bulk updates instead doing one-by-one?

Also - I will be merging a PR soon that will allow you to easily create an AsynIterator that could further improve the performance around the updates that need to be done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this is not efficient. I saw it being done this way in couple places though, example. Gonna reach out to Fleet team and ask if there is a reason AgentPolicySrvc doesn't expose bulkUpdate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree about doing this in bulk so we don't have to update the SO one by one. That being said, there is no method on agentPolicySerice that supports bulk currently. If Fleet team agrees on, we can add a new one trying to mimic what the package policy service does for bulkUpdate here: https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/services/package_policy.ts#L942

As this not only updates the SO and it does a lot of other stuff before doing the final update, it might be complex but worth if we can do it and speed this up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paul-tavares @kpollich
Do you think it would be better to add bulkUpdate in the current PR or create a new PR and issue just for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with that @szwarckonrad . Thanks

Comment on lines 80 to 82
policyUpdateErrors.forEach((e) =>
logger.error(`Policy [${e.id}] failed to update due to error: ${e.error}`)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you output only 1 error that includes all of the messages instead of individual message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Comment on lines 79 to 82
logger.error(`Done. ${policyUpdateErrors.length} out of ${updates.length}`);
policyUpdateErrors.forEach((e) =>
logger.error(`Policy [${e.id}] failed to update due to error: ${e.error}`)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would clarify the message here and also, there is no need to output one log entry per errror encountered.

Here is my suggestion:

logger.error(
  `Done - ${policyUpdateErrors.length} out of ${updates.length} were successful. Errors encountered:\n${
    policyUpdateErrors.map(e => `Policy [${e.id}] failed to update due to error: ${e.error}`).join('\n')
  }`
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way better, thanks :)

@@ -382,6 +384,123 @@ describe('ingest_integration tests ', () => {
});
});

describe('agent policy update callback', () => {
it('AppFeature disabled - returns an error if higher tier features are turned on in the policy', async () => {
const logger = loggingSystemMock.create().get('ingest_integration.test');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of duplication of test setup between all of these tests... perhaps consolidate some of that into describe() level variables that are initialized via beforeEach()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up.

agentPolicy.is_protected &&
!appFeatures.isEnabled(AppFeatureSecurityKey.endpointAgentTamperProtection)
) {
const error = new Error('Agent Tamper Protection requires Complete Endpoint Security tier');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: do not reference the tier required in serverless in order to use this functionality. Instead just return a more generic message like Agent Tamper Protection is not allowed in current environment. If you like, you could log a message to the kibana logs indicating that the deployment does not meet the needed tier, but I would not return that back in the API (this message is being returned in the API, correct?)

Also - what HTTP code is returned when this is triggered?

It should be a 403 - I think Fleet has a way for us to defined the return code on the Error that is returned by an external callback, so you can use that to set it as such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent Tamper Protection is not allowed in current environment sounds good. It returns 403 and Ive added cypress'

Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fleet changes LGTM

Copy link
Contributor

@tomsonpl tomsonpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@paul-tavares paul-tavares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left a comment, but its optional and should not hold this PR back. 👍

Comment on lines +67 to +76
await pMap(updates, async (update) => {
try {
return await agentPolicyService.bumpRevision(internalSoClient, esClient, update.id, {
user: { username: 'elastic' } as AuthenticatedUser,
removeProtection: true,
});
} catch (error) {
policyUpdateErrors.push({ error, id: update.id });
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with that @szwarckonrad . Thanks

// Agent Policy Service will check for apiPassThrough and rethrow. Route handler will check for statusCode and overwrite.
agentTamperProtectionUnavailableError.statusCode = 403;
agentTamperProtectionUnavailableError.apiPassThrough = true;
logger.error('Agent Tamper Protection requires Complete Endpoint Security tier');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message could be improved for debugging purposes by including the agent policy ID (and maybe name as well) in the error log message. so that we get an idea which one triggered the issue.

szwarckonrad added a commit that referenced this pull request Feb 1, 2024
…icy Settings (#174278)

This PR is part of an effort to limit EDR Workflow features to the
Endpoint Complete tier on serverless and focuses on UI part of gating
Agent Tamper Protection.

Related PRs:
- [Agent Tamper Protection
API](#174400)
- [Protection updates](#175129)

Currently, the Agent tamper protection switch is in Fleet's agent policy
settings. Right now (for ESS and Serverless), we only control access to
this field with a license check (platinum). This PR adds an extra check
for AppFeature, which is included in the Complete tier PLI config. We
decided to stick with the existing Fleet privileges for this component,
and no extra changes are needed RBAC wise (confirmed with
@roxana-gheorghe).

Changes:

1. Added `endpointAgentTamperProtection` appFeature and linked it to the
endpoint:complete tier.
2. Made an upselling component and registered it with the Upselling
Service.
3. Passed the upselling component to Fleet using UIExtension.
4. Added Cypress end-to-end coverage for Essentials showing the
upselling component and Complete showing the form component.

![Screenshot 2024-01-24 at 15 52
17](https://github.com/elastic/kibana/assets/29123534/6cdc3197-7bd0-4607-9323-ae4318493653)
@szwarckonrad szwarckonrad enabled auto-merge (squash) February 1, 2024 20:19
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
fleet 1104 1109 +5

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
fleet 49 51 +2
Unknown metric groups

API count

id before after diff
fleet 1222 1227 +5

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @szwarckonrad

@szwarckonrad szwarckonrad merged commit 8166d18 into elastic:main Feb 1, 2024
37 checks passed
@kibanamachine kibanamachine added v8.13.0 backport:skip This commit does not require backporting labels Feb 1, 2024
fkanout pushed a commit to fkanout/kibana that referenced this pull request Feb 7, 2024
…icy Settings (elastic#174278)

This PR is part of an effort to limit EDR Workflow features to the
Endpoint Complete tier on serverless and focuses on UI part of gating
Agent Tamper Protection.

Related PRs:
- [Agent Tamper Protection
API](elastic#174400)
- [Protection updates](elastic#175129)

Currently, the Agent tamper protection switch is in Fleet's agent policy
settings. Right now (for ESS and Serverless), we only control access to
this field with a license check (platinum). This PR adds an extra check
for AppFeature, which is included in the Complete tier PLI config. We
decided to stick with the existing Fleet privileges for this component,
and no extra changes are needed RBAC wise (confirmed with
@roxana-gheorghe).

Changes:

1. Added `endpointAgentTamperProtection` appFeature and linked it to the
endpoint:complete tier.
2. Made an upselling component and registered it with the Upselling
Service.
3. Passed the upselling component to Fleet using UIExtension.
4. Added Cypress end-to-end coverage for Essentials showing the
upselling component and Complete showing the form component.

![Screenshot 2024-01-24 at 15 52
17](https://github.com/elastic/kibana/assets/29123534/6cdc3197-7bd0-4607-9323-ae4318493653)
fkanout pushed a commit to fkanout/kibana that referenced this pull request Feb 7, 2024
…licy Settings (elastic#174400)

This PR is part of an effort to limit EDR Workflow features to the
Endpoint Complete tier on serverless and focuses on server skde part of
gating Agent Tamper Protection.

Related PRs:

elastic#174278
elastic#175129

**We decided to stick with the existing Fleet privileges for this
component, and no extra changes are needed RBAC wise (confirmed with
@roxana-gheorghe).**

**Plugin/Policy Watcher Changes**:
To monitor agent policies for a downgrade in tier (from complete to
essentials) and disable agent protections if enabled, the following
steps have been taken:

1. A new app feature, `endpoint_agent_tamper_protection`, has been
introduced and linked to the `endpoint:complete` tier.
2. An additional method, `bumpRevision`, has been exposed in the fleet's
agent policy service. This method utilizes the service's internal update
function and includes a `disable_protection` flag, allowing it to be
used without further modifications.
3. The security solution side calls this method upon successful fleet
plugin setup. If the `endpoint_agent_tamper_protection` app feature is
not enabled, it retrieves all agent policies with `is_protected: true`
and updates these policies with `is_protected: false`.

**API Changes**:
To respond to attempts to activate agent protection via the API by users
on the Essentials tier, the following steps have been taken:

1. External callback functionality has been added to the agentPolicy
service, following the implementation in packagePolicy.
2. Update and create agent policy callbacks have been registered in the
security solution. These callbacks check for the enabled status of the
`endpoint_agent_tamper_protection` app feature. If disabled, the
callback throws an error.
3. External callback execution has been added to the update and create
methods in agent policy route handlers.

---------

Co-authored-by: kibanamachine <[email protected]>
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
…icy Settings (elastic#174278)

This PR is part of an effort to limit EDR Workflow features to the
Endpoint Complete tier on serverless and focuses on UI part of gating
Agent Tamper Protection.

Related PRs:
- [Agent Tamper Protection
API](elastic#174400)
- [Protection updates](elastic#175129)

Currently, the Agent tamper protection switch is in Fleet's agent policy
settings. Right now (for ESS and Serverless), we only control access to
this field with a license check (platinum). This PR adds an extra check
for AppFeature, which is included in the Complete tier PLI config. We
decided to stick with the existing Fleet privileges for this component,
and no extra changes are needed RBAC wise (confirmed with
@roxana-gheorghe).

Changes:

1. Added `endpointAgentTamperProtection` appFeature and linked it to the
endpoint:complete tier.
2. Made an upselling component and registered it with the Upselling
Service.
3. Passed the upselling component to Fleet using UIExtension.
4. Added Cypress end-to-end coverage for Essentials showing the
upselling component and Complete showing the form component.

![Screenshot 2024-01-24 at 15 52
17](https://github.com/elastic/kibana/assets/29123534/6cdc3197-7bd0-4607-9323-ae4318493653)
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
…licy Settings (elastic#174400)

This PR is part of an effort to limit EDR Workflow features to the
Endpoint Complete tier on serverless and focuses on server skde part of
gating Agent Tamper Protection.

Related PRs:

elastic#174278
elastic#175129

**We decided to stick with the existing Fleet privileges for this
component, and no extra changes are needed RBAC wise (confirmed with
@roxana-gheorghe).**

**Plugin/Policy Watcher Changes**:
To monitor agent policies for a downgrade in tier (from complete to
essentials) and disable agent protections if enabled, the following
steps have been taken:

1. A new app feature, `endpoint_agent_tamper_protection`, has been
introduced and linked to the `endpoint:complete` tier.
2. An additional method, `bumpRevision`, has been exposed in the fleet's
agent policy service. This method utilizes the service's internal update
function and includes a `disable_protection` flag, allowing it to be
used without further modifications.
3. The security solution side calls this method upon successful fleet
plugin setup. If the `endpoint_agent_tamper_protection` app feature is
not enabled, it retrieves all agent policies with `is_protected: true`
and updates these policies with `is_protected: false`.

**API Changes**:
To respond to attempts to activate agent protection via the API by users
on the Essentials tier, the following steps have been taken:

1. External callback functionality has been added to the agentPolicy
service, following the implementation in packagePolicy.
2. Update and create agent policy callbacks have been registered in the
security solution. These callbacks check for the enabled status of the
`endpoint_agent_tamper_protection` app feature. If disabled, the
callback throws an error.
3. External callback execution has been added to the update and create
methods in agent policy route handlers.

---------

Co-authored-by: kibanamachine <[email protected]>
fkanout pushed a commit to fkanout/kibana that referenced this pull request Mar 4, 2024
…icy Settings (elastic#174278)

This PR is part of an effort to limit EDR Workflow features to the
Endpoint Complete tier on serverless and focuses on UI part of gating
Agent Tamper Protection.

Related PRs:
- [Agent Tamper Protection
API](elastic#174400)
- [Protection updates](elastic#175129)

Currently, the Agent tamper protection switch is in Fleet's agent policy
settings. Right now (for ESS and Serverless), we only control access to
this field with a license check (platinum). This PR adds an extra check
for AppFeature, which is included in the Complete tier PLI config. We
decided to stick with the existing Fleet privileges for this component,
and no extra changes are needed RBAC wise (confirmed with
@roxana-gheorghe).

Changes:

1. Added `endpointAgentTamperProtection` appFeature and linked it to the
endpoint:complete tier.
2. Made an upselling component and registered it with the Upselling
Service.
3. Passed the upselling component to Fleet using UIExtension.
4. Added Cypress end-to-end coverage for Essentials showing the
upselling component and Complete showing the form component.

![Screenshot 2024-01-24 at 15 52
17](https://github.com/elastic/kibana/assets/29123534/6cdc3197-7bd0-4607-9323-ae4318493653)
fkanout pushed a commit to fkanout/kibana that referenced this pull request Mar 4, 2024
…licy Settings (elastic#174400)

This PR is part of an effort to limit EDR Workflow features to the
Endpoint Complete tier on serverless and focuses on server skde part of
gating Agent Tamper Protection.

Related PRs:

elastic#174278
elastic#175129

**We decided to stick with the existing Fleet privileges for this
component, and no extra changes are needed RBAC wise (confirmed with
@roxana-gheorghe).**

**Plugin/Policy Watcher Changes**:
To monitor agent policies for a downgrade in tier (from complete to
essentials) and disable agent protections if enabled, the following
steps have been taken:

1. A new app feature, `endpoint_agent_tamper_protection`, has been
introduced and linked to the `endpoint:complete` tier.
2. An additional method, `bumpRevision`, has been exposed in the fleet's
agent policy service. This method utilizes the service's internal update
function and includes a `disable_protection` flag, allowing it to be
used without further modifications.
3. The security solution side calls this method upon successful fleet
plugin setup. If the `endpoint_agent_tamper_protection` app feature is
not enabled, it retrieves all agent policies with `is_protected: true`
and updates these policies with `is_protected: false`.

**API Changes**:
To respond to attempts to activate agent protection via the API by users
on the Essentials tier, the following steps have been taken:

1. External callback functionality has been added to the agentPolicy
service, following the implementation in packagePolicy.
2. Update and create agent policy callbacks have been registered in the
security solution. These callbacks check for the enabled status of the
`endpoint_agent_tamper_protection` app feature. If disabled, the
callback throws an error.
3. External callback execution has been added to the update and create
methods in agent policy route handlers.

---------

Co-authored-by: kibanamachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Defend Workflows “EDR Workflows” sub-team of Security Solution Team:Fleet Team label for Observability Data Collection Fleet team v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants