Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Alerting] Creating global service for fetching and caching rules settings #192404

Merged
merged 13 commits into from
Sep 16, 2024

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Sep 9, 2024

Resolves #184321 and #149884

Summary

Created a RulesSettingsService that caches flapping and query delay settings per space for 1 minute so rules are not accessing the SO index with every execution.

Also moved all classes related to the rules settings into the rules_settings folder so some of these changes are just renaming changes.

To Verify

  1. Add some logging to x-pack/plugins/alerting/server/rules_settings/rules_settings_service.ts to indicate when settings are being fetched and when they're returning from the cache.
  2. Create and run some rules in different spaces to see that flapping and query delay settings are being returned from the cache when the cache has not expired.

@ymao1 ymao1 changed the title Moving all existing rules setting files into rules settings folders [Response Ops][Alerting] Creating global service for fetching and caching rules settings Sep 10, 2024
@ymao1 ymao1 self-assigned this Sep 10, 2024
@ymao1 ymao1 added Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.16.0 labels Sep 10, 2024
@@ -588,33 +591,36 @@ export class AlertingPlugin {
};

taskRunnerFactory.initialize({
logger,
actionsConfigMap: getActionsConfigMap(this.config.rules.run.actions),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alphabetizing 🙈

@ymao1 ymao1 marked this pull request as ready for review September 10, 2024 17:01
@ymao1 ymao1 requested a review from a team as a code owner September 10, 2024 17:01
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@ymao1 ymao1 requested review from pmuellr and guskovaue September 10, 2024 17:01
rulesSettingsService: new RulesSettingsService({
getRulesSettingsClientWithRequest,
isServerless: !!plugins.serverless,
}),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actual changes are to pass this new global rules setting service to the task runner factory and to remove getRulesSettingsClientWithRequest

Copy link
Contributor

@guskovaue guskovaue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Tested locally, works as expected!

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, but feels like x-pack/plugins/alerting/server/rules_settings/rules_settings_service.test.ts could use some additional tests. Also noted a potential design change to use just one map instead of three, within the service.

@@ -86,6 +87,8 @@ export default function createAlertsAsDataInstallResourcesTest({ getService }: F

it(`should write alert docs during rule execution with flapping.enabled: ${enableFlapping}`, async () => {
await setFlappingSettings(enableFlapping);
// wait so cache expires
await setTimeoutAsync(60000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a minute??? Oh noes ...

I wonder ... could we write a new route for FTR that we could call to modify the value? Doesn't need to be in this PR, but it is running this test twice so we just added 2m to this FTR group, it pains my heart :-)

And there are some more below :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in cf77b09


export class RulesSettingsService {
private defaultQueryDelaySettings: RulesSettingsQueryDelayProperties;
private settingsLastUpdated: Map<string, number> = new Map();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it would be better to have one map, that had the lastUpdated, queryDelaySettings, and flappingSettings in it, vs 3 maps. Seems like we'd have a better chance of keeping everything in sync over time. Probably easier to add new settings as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in f2aa813

this.flappingSettings.has(spaceId) &&
this.queryDelaySettings.has(spaceId)
) {
const lastUpdated = new Date(this.settingsLastUpdated.get(spaceId)!).getTime();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical field! You can imagine this getting messed up and then we never refresh the cache, or we refresh it every time. Everything looks good here, more of a note for myself :-)

const currentFlappingSettings = this.flappingSettings.get(spaceId)!;
const currentQueryDelaySettings = this.queryDelaySettings.get(spaceId)!;

if (now - lastUpdated >= CACHE_INTERVAL_MS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to parameterize this somehow, to let us do the FTRs without all the 1m timeouts ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in cf77b09

return await this.fetchSettings(request, spaceId, now);
} catch (err) {
// return cached settings on error
return {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're logging errors on the inner calls, but seems kinda scary to not have anything here. Feels like maybe having a debug logs might good. The existing comments seem like good locations and text for those logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a45e6a3

expect(rulesSettingsService.queryDelaySettings.get('default')).toEqual({ delay: 0 });
// @ts-ignore - accessing private variable
expect(rulesSettingsService.flappingSettings.get('default')).toEqual({
enabled: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this a constant, save some room.

Also, we don't seem to be testing any other values that this. Seems like we should have a test like the one where we test the changed query delay.

Copy link
Contributor Author

@ymao1 ymao1 Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 60ce3e6

});

test('should use cached settings if cache has not expired', async () => {
const rulesSettingsClient = rulesSettingsClientMock.create();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we set the values to non-default values, to make sure we end up getting the cached ones, instead of default? Seems appropriate for all the "cached" tests below ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 60ce3e6

@ymao1 ymao1 requested a review from pmuellr September 13, 2024 18:17
Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the changes!

@ymao1
Copy link
Contributor Author

ymao1 commented Sep 16, 2024

@elasticmachine merge upstream

@ymao1 ymao1 added backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) v9.0.0 labels Sep 16, 2024
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 merged commit 7a0fe2e into elastic:main Sep 16, 2024
38 checks passed
@ymao1 ymao1 deleted the alerting/cache-rules-settings branch September 16, 2024 13:22
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 16, 2024
…hing rules settings (elastic#192404)

Resolves elastic#184321 and
elastic#149884

## Summary

Created a `RulesSettingsService` that caches flapping and query delay
settings per space for 1 minute so rules are not accessing the SO index
with every execution.

Also moved all classes related to the rules settings into the
`rules_settings` folder so some of these changes are just renaming
changes.

## To Verify
1. Add some logging to
`x-pack/plugins/alerting/server/rules_settings/rules_settings_service.ts`
to indicate when settings are being fetched and when they're returning
from the cache.
2. Create and run some rules in different spaces to see that flapping
and query delay settings are being returned from the cache when the
cache has not expired.

---------

Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit 7a0fe2e)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Sep 16, 2024
…nd caching rules settings (#192404) (#193011)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Response Ops][Alerting] Creating global service for fetching and
caching rules settings
(#192404)](#192404)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Ying
Mao","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-09-16T13:22:44Z","message":"[Response
Ops][Alerting] Creating global service for fetching and caching rules
settings (#192404)\n\nResolves
#184321
and\r\nhttps://github.com//issues/149884\r\n\r\n##
Summary\r\n\r\nCreated a `RulesSettingsService` that caches flapping and
query delay\r\nsettings per space for 1 minute so rules are not
accessing the SO index\r\nwith every execution.\r\n\r\nAlso moved all
classes related to the rules settings into the\r\n`rules_settings`
folder so some of these changes are just renaming\r\nchanges.\r\n\r\n##
To Verify\r\n1. Add some logging
to\r\n`x-pack/plugins/alerting/server/rules_settings/rules_settings_service.ts`\r\nto
indicate when settings are being fetched and when they're
returning\r\nfrom the cache.\r\n2. Create and run some rules in
different spaces to see that flapping\r\nand query delay settings are
being returned from the cache when the\r\ncache has not
expired.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine
<[email protected]>","sha":"7a0fe2e83683cf38c537694daa88a3a75bfe3e57","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Feature:Alerting","release_note:skip","Team:ResponseOps","v9.0.0","backport:prev-minor","v8.16.0"],"title":"[Response
Ops][Alerting] Creating global service for fetching and caching rules
settings","number":192404,"url":"https://github.com/elastic/kibana/pull/192404","mergeCommit":{"message":"[Response
Ops][Alerting] Creating global service for fetching and caching rules
settings (#192404)\n\nResolves
#184321
and\r\nhttps://github.com//issues/149884\r\n\r\n##
Summary\r\n\r\nCreated a `RulesSettingsService` that caches flapping and
query delay\r\nsettings per space for 1 minute so rules are not
accessing the SO index\r\nwith every execution.\r\n\r\nAlso moved all
classes related to the rules settings into the\r\n`rules_settings`
folder so some of these changes are just renaming\r\nchanges.\r\n\r\n##
To Verify\r\n1. Add some logging
to\r\n`x-pack/plugins/alerting/server/rules_settings/rules_settings_service.ts`\r\nto
indicate when settings are being fetched and when they're
returning\r\nfrom the cache.\r\n2. Create and run some rules in
different spaces to see that flapping\r\nand query delay settings are
being returned from the cache when the\r\ncache has not
expired.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine
<[email protected]>","sha":"7a0fe2e83683cf38c537694daa88a3a75bfe3e57"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/192404","number":192404,"mergeCommit":{"message":"[Response
Ops][Alerting] Creating global service for fetching and caching rules
settings (#192404)\n\nResolves
#184321
and\r\nhttps://github.com//issues/149884\r\n\r\n##
Summary\r\n\r\nCreated a `RulesSettingsService` that caches flapping and
query delay\r\nsettings per space for 1 minute so rules are not
accessing the SO index\r\nwith every execution.\r\n\r\nAlso moved all
classes related to the rules settings into the\r\n`rules_settings`
folder so some of these changes are just renaming\r\nchanges.\r\n\r\n##
To Verify\r\n1. Add some logging
to\r\n`x-pack/plugins/alerting/server/rules_settings/rules_settings_service.ts`\r\nto
indicate when settings are being fetched and when they're
returning\r\nfrom the cache.\r\n2. Create and run some rules in
different spaces to see that flapping\r\nand query delay settings are
being returned from the cache when the\r\ncache has not
expired.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine
<[email protected]>","sha":"7a0fe2e83683cf38c537694daa88a3a75bfe3e57"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Ying Mao <[email protected]>
mikecote added a commit to mikecote/kibana that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache query delay settings in the alerting task runner
6 participants