Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting] Fix job notifications poller #177537

Merged

Conversation

tsullivan
Copy link
Member

@tsullivan tsullivan commented Feb 22, 2024

Summary

In previous code, Reporting fetched for a list of pending reports and compared the status of each result to a stored record of pending reports. The flakiness occurred because sometimes the results don't contain a record for a very "fresh" report since the refresh interval hadn't ticked and the stored change wasn't reflecting in search results.

With switching Reporting to use a data stream, there was new flakiness in serverless tests. The area of flakiness was in getting Reporting to show the "Job completed" toast notification. The main difference faced with data streams is a lower index refresh interval, which is lower even more for serverless.

This PR fixes the poller to loop through the list of stored pending reports and compare each to the result from the search. If a stored report isn't present in the search, we consider that to still be in the pending state.

Other changes:

  • Improvements the class responsible for managing updates to the stored list of pending reports, so that each operation performs atomically and go through a queue to mitigate the chance of race conditions.
  • Update function names and variable names for readability
  • Remove the unused removeJob function.
  • Move helper code into private methods of the StreamHandler class and simplify x-pack/plugins/reporting/public/plugin.ts

Release note

Fixed an issue in Reporting with consistently showing the toast message for completed report jobs.

@tsullivan tsullivan marked this pull request as ready for review February 22, 2024 14:57
@tsullivan tsullivan requested a review from a team as a code owner February 22, 2024 14:57
/*
* Use Kibana Toast API to show our messages
*
* Public for purposes of testing
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: should we make the method public for the purpose of testing, or make it private and use // @ts-expect-error in the test where we need to access it?

Copy link
Member Author

@tsullivan tsullivan Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...or create a new class for testing that extends this in the test file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can extract most of the logic into a separate function and unit test it there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dosant I refactored the methods to protected and created a test class to extend this one and expose testable methods

const newFailed: JobSummary[] = [];
const newPending: JobId[] = [];

for (const pendingJobId of previousPending) {
Copy link
Member Author

@tsullivan tsullivan Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main change is here: loop through the set of stored job IDs rather than loop through the job IDs returned in the search

Comment on lines +170 to +172
// Keep job tracked in storage if is pending. It also
// may not be present in apiClient.findForJobIds
// response if index refresh is slow
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment explains why the bug happened.

@tsullivan tsullivan added the Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) label Feb 22, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/appex-sharedux (Team:SharedUX)

@tsullivan tsullivan self-assigned this Feb 22, 2024
// Operations on the localStorage key can happen from various
// parts of code. Using a queue to manage async operations allows
// operations to process one at a time
let operationQueue = Promise.resolve();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each plugin that imports the "kbn-reporting" package would receive its copy, hence different queues. Could this be a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - it probably would.

Created d771862

// add the new job
jobs.push(jobId);
// write back to local storage
localStorage.setItem(JOB_COMPLETION_NOTIFICATIONS_SESSION_KEY, JSON.stringify(jobs));
Copy link
Contributor

@Dosant Dosant Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if instead of managing the state as a single blob, we would write a job state for each job separately.
e.g.

localStorage.setItem(prefix-${jobId}, job),
would this keep things simpler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would complicate the process of retrieving all of the pending jobs. We can use a common prefix, with a dynamic suffix for the job ID. But to get all of the pending job IDs, we need to get all of the keys from local storage and check each one to see if it has the matching prefix. So, I think this would not make the process simpler.

Copy link
Contributor

@Dosant Dosant Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But to get all of the pending job IDs, we need to get all of the keys from local storage and check each one to see if it has the matching prefix.

yes, but this keeps the code synchronous, and there is no need to manage the queue. it seems like this option would be simpler even with the need to read all keys (which would be hidden in a small function)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there won't be millions of keys in local storage, this seems fair. I think this is good feedback since the change to async wasn't needed for the important part of this fix.

Pushed a471462

/*
* Use Kibana Toast API to show our messages
*
* Public for purposes of testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can extract most of the logic into a separate function and unit test it there?

@Dosant
Copy link
Contributor

Dosant commented Feb 26, 2024

@elasticmachine merge upstream

Copy link
Contributor

@Dosant Dosant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
I'd still consider the other option with keeping the sync read/write #177537 (comment) . It seems simpler and smaller bug surface

@tsullivan tsullivan force-pushed the reporting/public-job-poller-stabilization branch from 2a1cf2f to a471462 Compare February 27, 2024 22:45
@tsullivan tsullivan changed the title Fix job notifications poller [Reporting] Fix job notifications poller Feb 27, 2024
@tsullivan tsullivan enabled auto-merge (squash) February 27, 2024 22:51
@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #62 / Alerting transform alert rule types transform_health rule runs correctly

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/reporting-public 107 104 -3

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
reporting 74.1KB 74.1KB -1.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
reporting 42.8KB 43.0KB +233.0B
Unknown metric groups

API count

id before after diff
@kbn/reporting-public 113 110 -3

ESLint disabled line counts

id before after diff
reporting 3 4 +1

Total ESLint disabled count

id before after diff
reporting 3 4 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @tsullivan

@tsullivan tsullivan merged commit c1729c0 into elastic:main Feb 28, 2024
18 checks passed
@kibanamachine kibanamachine added v8.14.0 backport:skip This commit does not require backporting labels Feb 28, 2024
semd pushed a commit to semd/kibana that referenced this pull request Mar 4, 2024
## Summary

In previous code, Reporting fetched for a list of pending reports and
compared the status of each result to a stored record of pending
reports. The flakiness occurred because sometimes the results don't
contain a record for a very "fresh" report since the refresh interval
hadn't ticked and the stored change wasn't reflecting in search results.

With [switching Reporting to use a data
stream](elastic#176022), there was new
flakiness in serverless tests. The area of flakiness was in getting
Reporting to show the "Job completed" toast notification. The main
difference faced with data streams is a lower index refresh interval,
which is lower even more for serverless.

This PR fixes the poller to loop through the list of stored pending
reports and compare each to the result from the search. If a stored
report isn't present in the search, we consider that to still be in the
pending state.

Other changes:
- Improvements the class responsible for managing updates to the stored
list of pending reports, so that each operation performs atomically and
go through a queue to mitigate the chance of race conditions.
- Update function names and variable names for readability
- Remove the unused `removeJob` function.
- Move helper code into private methods of the `StreamHandler` class and
simplify `x-pack/plugins/reporting/public/plugin.ts`

## Release note
Fixed an issue in Reporting with consistently showing the toast message
for completed report jobs.

---------

Co-authored-by: Kibana Machine <[email protected]>
fkanout pushed a commit to fkanout/kibana that referenced this pull request Mar 4, 2024
## Summary

In previous code, Reporting fetched for a list of pending reports and
compared the status of each result to a stored record of pending
reports. The flakiness occurred because sometimes the results don't
contain a record for a very "fresh" report since the refresh interval
hadn't ticked and the stored change wasn't reflecting in search results.

With [switching Reporting to use a data
stream](elastic#176022), there was new
flakiness in serverless tests. The area of flakiness was in getting
Reporting to show the "Job completed" toast notification. The main
difference faced with data streams is a lower index refresh interval,
which is lower even more for serverless.

This PR fixes the poller to loop through the list of stored pending
reports and compare each to the result from the search. If a stored
report isn't present in the search, we consider that to still be in the
pending state.

Other changes:
- Improvements the class responsible for managing updates to the stored
list of pending reports, so that each operation performs atomically and
go through a queue to mitigate the chance of race conditions.
- Update function names and variable names for readability
- Remove the unused `removeJob` function.
- Move helper code into private methods of the `StreamHandler` class and
simplify `x-pack/plugins/reporting/public/plugin.ts`

## Release note
Fixed an issue in Reporting with consistently showing the toast message
for completed report jobs.

---------

Co-authored-by: Kibana Machine <[email protected]>
@tsullivan tsullivan deleted the reporting/public-job-poller-stabilization branch April 30, 2024 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:fix Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants