Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle decryption errors during ESO migrations #105968

Merged
merged 27 commits into from
Jul 28, 2021

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Jul 16, 2021

Resolves #101582

Summary

  • Updated the ESO migration function to take a shouldMigrateIfDecryptionFails flag. If this flag is set to false or undefined, decryption errors will be thrown (this is the previous behavior). If the flag is set to true, decryption errors will be caught, encrypted attributes will be stripped and the migration function will be applied to the stripped document.
  • Note that if there are errors during the actual migration or errors during encryption, they will continue to be thrown.
  • Updated the rules and connectors migration function to set shouldMigrateIfDecryptionFails to true and to log and throw any errors thrown during migration. While the original issue specifically mentioned decryption errors, the try/catch block in these migration functions would actually swallow any error during migration (decryption, migrating, encryption) and skip migrating the saved object.

To verify

  • Make sure you use yarn es snapshot --ssl --license trial -E path.data=../data to run ES so that you can test the migrations.
  • Make sure you have value for xpack.encryptedSavedObjects.encryptionKey set in your kibana.yml
  • To make use of existing rule/connector migrations, switch to a 7.10 version of Kibana and create some rules and email and/or webhook connectors. Alternatively, you can define a custom migration for 7.15.0 and just switch back to running 7.14.0.
  • Switch to this branch. Change the value for xpack.encryptedSavedObjects.encryptionKey in your kibana.yml. Start up Kibana.
  • Verify that the migrations for rules and connectors all ran by checking the saved object.
    • 7.11 rules should have new updatedAt and notifyWhen fields
    • 7.11 email/webhook connectors should have new hasAuth field
    • 7.14 all connectors should have new isMissingSecrets field
    • or if you defined your own migration for testing, that should have run.
  • See that your rules & connectors are erroring with decrypt failures. These should be fixable by disabling/renabling the rule and reentering secrets for the connector.
  • See warning messages in the logs informing you that saved object decryption failed but migration proceeded anyway.

Checklist

Delete any items that are not applicable to this PR.

@ymao1 ymao1 changed the title Alerting/decrypt error on migration eso Gracefully handle decryption errors during ESO migrations Jul 16, 2021
@@ -15,121 +15,185 @@ import { migrationMocks } from 'src/core/server/mocks';
const context = migrationMocks.createContext();
const encryptedSavedObjectsSetup = encryptedSavedObjectsMock.createSetup();

describe('7.10.0', () => {
describe('successful migrations', () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend reviewing this with Hide whitespace changes turned on for more meaningful diffs.

@ymao1 ymao1 self-assigned this Jul 16, 2021
@ymao1 ymao1 added Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! v7.15.0 v8.0.0 labels Jul 16, 2021
@ymao1 ymao1 marked this pull request as ready for review July 16, 2021 18:12
@ymao1 ymao1 requested review from a team as code owners July 16, 2021 18:12
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-security (Team:Security)

@ymao1
Copy link
Contributor Author

ymao1 commented Jul 19, 2021

@elasticmachine merge upstream

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tried this yet, but code LGTM.

One thing - we have some existing FT that look like they are testing migrated objects, so it seems like there's a possibility for us to add one for this? Or I guess two - one for rules, one for connectors? Not quite sure how these are run, maybe it's not possible to create such a test case?

Feels like there's a follow-on for this - it would be nice to mark un-decryptable connectors the same way we mark imported connectors, with the "missing secrets" property. That way, we can be a little more pro-active to the user regarding the state of the objects, without having to wait for them to be executed to find out there's a problem. But I don't think we really have a property like that for alerting rules, and I suspect they are a bigger worry than connectors.

And to handle that, we'd need to have the ESO plugin also return back an indication that the migration didn't couldn't decrypt, so we would know when we could set the "missing secrets" property, and the equivalent of what that would be for a rule.

It's also probably the case that we'd like to be even MORE pro-active than I suggested, via something like a Notification Center message providing migration issues.

So, I think this is pretty good for what we can reasonably handle today, and it will be good to see this in practice to see if we even need to do more work here, and if there are better ways of informing the user what has happened.

@azasypkin
Copy link
Member

ACK: going to review this PR today, or tomorrow CET morning at the latest

@azasypkin azasypkin self-requested a review July 20, 2021 09:58
Copy link
Member

@azasypkin azasypkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea looks good to me, I left few nits and questions. I also left one concern that I'd like to think about a bit more, but didn't want to hold the review completely.

@ymao1 ymao1 requested a review from a team as a code owner July 20, 2021 17:43
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Jul 20, 2021
@ymao1 ymao1 requested a review from azasypkin July 22, 2021 11:52
Copy link
Member

@azasypkin azasypkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with green CI. Thanks a lot for the patience while making necessary adjustments!

Special 🏅 for the good test coverage 🙂

@ymao1
Copy link
Contributor Author

ymao1 commented Jul 22, 2021

@pmuellr There have been some changes since your review, would you mind re-reviewing? Specifically, I updated x-pack/plugins/alerting/server/task_runner/task_runner.ts to handle rules where the apiKey field does not exist. I did not change the handling for the case where apiKey is null though.

} = await this.context.encryptedSavedObjectsClient.getDecryptedAsInternalUser<RawAlert>(
'alert',
alertId,
{ namespace }
);

return apiKey;
if (!attributes.hasOwnProperty('apiKey')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like you're testing with no security right now (IRL), so I guess we'll find out if we have this property set for "no security" scenarios! I'm guessing there's a good chance we never set the field, and so we might not have that property, and so we wouldn't want to throw an error, but return null instead. Not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was just a quick look in the task_runner - I'll do a full re-review though ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmuellr Finally was able to verify that with security turned off, the apiKey field exists and is set to null. I ran my test migration after changing the encryption key and verified that there were no decryption failures during the migration (because there's nothing to decrypt) and so the apiKey field remains null and the rule runs fine after migration.

That being said, I am going to wait for this PR to be merged and test again after that. It seems like that should fix the unhandled promise rejection and then it won't matter if the api key is null or undefined.

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fleet changes 🚀

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-review of the action/alerting parts here; still LGTM, left a nit comment regarding object destructuring

Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ymao1
Copy link
Contributor Author

ymao1 commented Jul 28, 2021

Was able to revert changes to alerting task runner (in this commit after this PR got merged. Rules without an apiKey field throws an error when running (as expected) but no longer see Unhandled promise rejections

@ymao1 ymao1 added the auto-backport Deprecated - use backport:version if exact versions are needed label Jul 28, 2021
@ymao1
Copy link
Contributor Author

ymao1 commented Jul 28, 2021

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
encryptedSavedObjects 28 26 -2

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
encryptedSavedObjects 3 4 +1
Unknown metric groups

API count

id before after diff
encryptedSavedObjects 30 28 -2

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Jul 28, 2021
…5968)

* Updating unit tests

* Fixing types

* Updating readme and adding warning message

* Updating README

* PR fixes

* collapsing args to create migration fn

* Adding functional tests

* Adding comment to functional test

* Adding stripOrDecryptAttributesSync

* Using stripOrDecryptAttributesSync

* Fixing unit tests

* PR fixes

* PR fixes

* Moving validation of apikey existence in alerting task runner

* Cleanup

* Reverting changes to alerting task runner

* PR fixes

Co-authored-by: Kibana Machine <[email protected]>
ymao1 added a commit that referenced this pull request Jul 29, 2021
…107051)

* Updating unit tests

* Fixing types

* Updating readme and adding warning message

* Updating README

* PR fixes

* collapsing args to create migration fn

* Adding functional tests

* Adding comment to functional test

* Adding stripOrDecryptAttributesSync

* Using stripOrDecryptAttributesSync

* Fixing unit tests

* PR fixes

* PR fixes

* Moving validation of apikey existence in alerting task runner

* Cleanup

* Reverting changes to alerting task runner

* PR fixes

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: ymao1 <[email protected]>
streamich pushed a commit to vadimkibana/kibana that referenced this pull request Aug 8, 2021
…5968)

* Updating unit tests

* Fixing types

* Updating readme and adding warning message

* Updating README

* PR fixes

* collapsing args to create migration fn

* Adding functional tests

* Adding comment to functional test

* Adding stripOrDecryptAttributesSync

* Using stripOrDecryptAttributesSync

* Fixing unit tests

* PR fixes

* PR fixes

* Moving validation of apikey existence in alerting task runner

* Cleanup

* Reverting changes to alerting task runner

* PR fixes

Co-authored-by: Kibana Machine <[email protected]>
@ymao1 ymao1 deleted the alerting/decrypt-error-on-migration-eso branch August 30, 2021 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Actions Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! v7.15.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[alerting] decrypt errors during migration yield unmigrated alert saved objects
7 participants