Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Data Stream for Reporting storage #176022

Merged
merged 17 commits into from
May 21, 2024

Conversation

tsullivan
Copy link
Member

@tsullivan tsullivan commented Jan 31, 2024

Summary

Closes #161608

Release Note

Reporting internal storage has been changed from using regular indices to a data stream configuration for a more efficient sharding strategy. This change is not expected to have any impact to users.

Screenshots

Upgrade test (manual process)

Using a report generated before this change, and a report generated after "upgrading":
image
Even though the two reports are in different types storage, they are still managed by the same policy:
image
Looking at the details of the policy shows how the different types of storage are used:
image

Log lines

Initial startup in clean environment

[2024-05-13T13:22:49.138-07:00][INFO ][plugins.reporting.store] Creating ILM policy for reporting data stream: kibana-reporting
[2024-05-13T13:22:53.337-07:00][INFO ][plugins.reporting.store] Linking ILM policy to reporting data stream: .kibana-reporting, component template: kibana-reporting@custom

Kibana restart with ES running continuously

[2024-05-13T13:24:32.733-07:00][DEBUG][plugins.reporting.store] Found ILM policy kibana-reporting; skipping creation.
[2024-05-13T13:24:32.733-07:00][INFO ][plugins.reporting.store] Linking ILM policy to reporting data stream: .kibana-reporting, component template: kibana-reporting@custom

Checklist

Copy link
Contributor

A documentation preview will be available soon.

Help us out by validating the Buildkite preview and reporting issues here.
Please also be sure to double check all images to ensure they are correct in the preview.

Request a new doc build by commenting
  • Rebuild this PR: run docs-build
  • Rebuild this PR and all Elastic docs: run docs-build rebuild

run docs-build is much faster than run docs-build rebuild. A rebuild should only be needed in rare situations.

If your PR continues to fail for an unknown reason, the doc build pipeline may be broken. Elastic employees can check the pipeline status here.

@tsullivan tsullivan force-pushed the reporting/data-stream-ii branch from 9893121 to 7345ea1 Compare January 31, 2024 22:12
@tsullivan tsullivan force-pushed the reporting/data-stream-ii branch 2 times, most recently from ad926b1 to 55d32f8 Compare February 1, 2024 00:46
@elastic elastic deleted a comment from kibana-ci Feb 1, 2024
@tsullivan tsullivan force-pushed the reporting/data-stream-ii branch 2 times, most recently from 0801774 to 2cc94a1 Compare February 13, 2024 19:38
@tsullivan tsullivan force-pushed the reporting/data-stream-ii branch 3 times, most recently from ae03a84 to 35c76e6 Compare February 22, 2024 19:22
@tsullivan tsullivan force-pushed the reporting/data-stream-ii branch 4 times, most recently from e0a657f to a491ebc Compare February 27, 2024 22:48
tsullivan added a commit that referenced this pull request Feb 28, 2024
## Summary

In previous code, Reporting fetched for a list of pending reports and
compared the status of each result to a stored record of pending
reports. The flakiness occurred because sometimes the results don't
contain a record for a very "fresh" report since the refresh interval
hadn't ticked and the stored change wasn't reflecting in search results.

With [switching Reporting to use a data
stream](#176022), there was new
flakiness in serverless tests. The area of flakiness was in getting
Reporting to show the "Job completed" toast notification. The main
difference faced with data streams is a lower index refresh interval,
which is lower even more for serverless.

This PR fixes the poller to loop through the list of stored pending
reports and compare each to the result from the search. If a stored
report isn't present in the search, we consider that to still be in the
pending state.

Other changes:
- Improvements the class responsible for managing updates to the stored
list of pending reports, so that each operation performs atomically and
go through a queue to mitigate the chance of race conditions.
- Update function names and variable names for readability
- Remove the unused `removeJob` function.
- Move helper code into private methods of the `StreamHandler` class and
simplify `x-pack/plugins/reporting/public/plugin.ts`

## Release note
Fixed an issue in Reporting with consistently showing the toast message
for completed report jobs.

---------

Co-authored-by: Kibana Machine <[email protected]>
@tsullivan tsullivan force-pushed the reporting/data-stream-ii branch from a491ebc to c40121d Compare February 28, 2024 06:27
@tsullivan tsullivan marked this pull request as ready for review February 28, 2024 06:27
@tsullivan tsullivan requested a review from a team as a code owner February 28, 2024 06:27
@tsullivan tsullivan added release_note:enhancement (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) labels Feb 28, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/appex-sharedux (Team:SharedUX)

Copy link
Contributor

@sebelga sebelga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review only LGTM 👍

Needs followup, the data stream configuration set in ES must a data stream lifecycle policy

Do we need to wait for that first before merging this PR?

packages/kbn-reporting/common/constants.ts Outdated Show resolved Hide resolved
expect(job.payload.title).equal('A Saved Search With a DATE FILTER');

// wait for index refresh
await sleep(3000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how risky this is to introduce flakiness. Wouldn't it be better to pass a param to postJobCSV({ refreshIndex: true }) and avoid the sleep call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

  • Passing many test runs with the flaky test runner is definitely a requirement for this PR.
  • Calling for a refresh on the index whenever a job is posted seems expensive enough to merit seconds thoughts.

The amount of time to sleep should be derived from the refresh interval for data streams, which I think is 3 seconds (this is integration test is for stateful). However, in serverless it is 15 seconds. I will confirm these things in a sync with @elastic/es-data-management

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling for a refresh on the index whenever a job is posted seems expensive enough to merit seconds thoughts

I meant to only set refreshIndex: true for api integration tests to ensure consistency and not rely on a time value that can change at any moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(disclaimer about not really understanding the typescript code)

Could this use the wait_for option when indexing things so that you only wait for as long as is necessary for the refresh to happen naturally? I.e.: https://www.elastic.co/guide/en/elasticsearch/reference/8.12/docs-refresh.html#docs-refresh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebelga I think I understand now. But, I'm very hesitant to add a flag in the API that would only serve the purpose of tests.

@dakrone I will try your suggestion to use wait_for, and see if that allows for removing or lowering the sleep call.

If the sleep is needed, my goal will to make sure it is set to a calculated amount of time. It should be low enough to wait for ES and avoiding an understood race condition. It shouldn't be perceived as something to address flakiness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, I'm very hesitant to add a flag in the API that would only serve the purpose of tests

I see. My point on this is that we shouldn't have to test ES behaviour (in this case that it does refresh the index "at some point") and add some timeout for it. We should assume ES will do it. It's like mocking a debounce in jest, I know it will eventually execute the debounced function.

I know it's a trade off, I understand where you are coming from, but I think it is worth it if (1) it prevents flackiness (2) it reduce time to execute the tests. Just my 2 cents 😊

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the suggestion to change to refresh=wait_for in the operations to index new documents and update documents, and it seems to resolved this.

semd pushed a commit to semd/kibana that referenced this pull request Mar 4, 2024
## Summary

In previous code, Reporting fetched for a list of pending reports and
compared the status of each result to a stored record of pending
reports. The flakiness occurred because sometimes the results don't
contain a record for a very "fresh" report since the refresh interval
hadn't ticked and the stored change wasn't reflecting in search results.

With [switching Reporting to use a data
stream](elastic#176022), there was new
flakiness in serverless tests. The area of flakiness was in getting
Reporting to show the "Job completed" toast notification. The main
difference faced with data streams is a lower index refresh interval,
which is lower even more for serverless.

This PR fixes the poller to loop through the list of stored pending
reports and compare each to the result from the search. If a stored
report isn't present in the search, we consider that to still be in the
pending state.

Other changes:
- Improvements the class responsible for managing updates to the stored
list of pending reports, so that each operation performs atomically and
go through a queue to mitigate the chance of race conditions.
- Update function names and variable names for readability
- Remove the unused `removeJob` function.
- Move helper code into private methods of the `StreamHandler` class and
simplify `x-pack/plugins/reporting/public/plugin.ts`

## Release note
Fixed an issue in Reporting with consistently showing the toast message
for completed report jobs.

---------

Co-authored-by: Kibana Machine <[email protected]>
fkanout pushed a commit to fkanout/kibana that referenced this pull request Mar 4, 2024
## Summary

In previous code, Reporting fetched for a list of pending reports and
compared the status of each result to a stored record of pending
reports. The flakiness occurred because sometimes the results don't
contain a record for a very "fresh" report since the refresh interval
hadn't ticked and the stored change wasn't reflecting in search results.

With [switching Reporting to use a data
stream](elastic#176022), there was new
flakiness in serverless tests. The area of flakiness was in getting
Reporting to show the "Job completed" toast notification. The main
difference faced with data streams is a lower index refresh interval,
which is lower even more for serverless.

This PR fixes the poller to loop through the list of stored pending
reports and compare each to the result from the search. If a stored
report isn't present in the search, we consider that to still be in the
pending state.

Other changes:
- Improvements the class responsible for managing updates to the stored
list of pending reports, so that each operation performs atomically and
go through a queue to mitigate the chance of race conditions.
- Update function names and variable names for readability
- Remove the unused `removeJob` function.
- Move helper code into private methods of the `StreamHandler` class and
simplify `x-pack/plugins/reporting/public/plugin.ts`

## Release note
Fixed an issue in Reporting with consistently showing the toast message
for completed report jobs.

---------

Co-authored-by: Kibana Machine <[email protected]>
@tsullivan tsullivan marked this pull request as draft March 14, 2024 22:46
@tsullivan tsullivan force-pushed the reporting/data-stream-ii branch from 54b2239 to 7204d46 Compare May 15, 2024 21:21
* This method is automatically called on the Stack Management > Reporting page, by the `` API for users with
* privilege to manage ILM, to notify them when attention is needed to update the policy for any reason.
*/
public async checkIlmMigrationStatus(): Promise<IlmPolicyMigrationStatus> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relocated this code from x-pack/plugins/reporting/server/lib/deprecations/check_ilm_migration_status.ts

return IlmPolicyManager.create({ client });
}

private async createIndex(indexName: string) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to create the index or manually apply index settings in Reporting! That is handled in ES now.

@tsullivan tsullivan marked this pull request as ready for review May 15, 2024 21:24
@tsullivan tsullivan requested a review from a team as a code owner May 15, 2024 21:24
@tsullivan
Copy link
Member Author

/ci

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#5999

[✅] x-pack/test/functional/apps/discover/config.ts: 33/33 tests passed.

see run history

Copy link
Contributor

@jughosta jughosta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data Discovery changes LGTM 👍
Great work!

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/reporting-server 87 91 +4
Unknown metric groups

API count

id before after diff
@kbn/reporting-server 88 92 +4

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@rshen91 rshen91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested locally in both on prem and serverless. LGTM!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead release_note:enhancement Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Reporting] Use index templates and a data stream
8 participants