Use Data Stream for Reporting storage #176022

tsullivan · 2024-01-31T21:48:06Z

Summary

Depends on Install data stream template for Kibana reporting elasticsearch#97765
Depends on Move kibana reporting data stream settings into component template elasticsearch#107581
Add create a new report job and check the details of the templated data stream.
Run Discover tests in Flaky Test Runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5999

Release Note

Reporting internal storage has been changed from using regular indices to a data stream configuration for a more efficient sharding strategy. This change is not expected to have any impact to users.

Screenshots

Upgrade test (manual process)

Using a report generated before this change, and a report generated after "upgrading":

Even though the two reports are in different types storage, they are still managed by the same policy:

Looking at the details of the policy shows how the different types of storage are used:

Log lines

Initial startup in clean environment

[2024-05-13T13:22:49.138-07:00][INFO ][plugins.reporting.store] Creating ILM policy for reporting data stream: kibana-reporting
[2024-05-13T13:22:53.337-07:00][INFO ][plugins.reporting.store] Linking ILM policy to reporting data stream: .kibana-reporting, component template: kibana-reporting@custom

Kibana restart with ES running continuously

[2024-05-13T13:24:32.733-07:00][DEBUG][plugins.reporting.store] Found ILM policy kibana-reporting; skipping creation.
[2024-05-13T13:24:32.733-07:00][INFO ][plugins.reporting.store] Linking ILM policy to reporting data stream: .kibana-reporting, component template: kibana-reporting@custom

Checklist

Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
Flaky Test Runner was used on any tests changed
~~See https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5302 (internal link)~~

github-actions · 2024-01-31T21:48:19Z

A documentation preview will be available soon.

Help us out by validating the Buildkite preview and reporting issues here.
Please also be sure to double check all images to ensure they are correct in the preview.

🔨 Buildkite builds
📚 HTML diff: Buildkite - Jenkins
📙 Preview: Buildkite - Jenkins
🧪 Buildkite vs Jenkins diff

Request a new doc build by commenting

Rebuild this PR: run docs-build
Rebuild this PR and all Elastic docs: run docs-build rebuild

_{run docs-build is much faster than run docs-build rebuild. A rebuild should only be needed in rare situations.}

_{If your PR continues to fail for an unknown reason, the doc build pipeline may be broken. Elastic employees can check the pipeline status here.}

## Summary In previous code, Reporting fetched for a list of pending reports and compared the status of each result to a stored record of pending reports. The flakiness occurred because sometimes the results don't contain a record for a very "fresh" report since the refresh interval hadn't ticked and the stored change wasn't reflecting in search results. With [switching Reporting to use a data stream](#176022), there was new flakiness in serverless tests. The area of flakiness was in getting Reporting to show the "Job completed" toast notification. The main difference faced with data streams is a lower index refresh interval, which is lower even more for serverless. This PR fixes the poller to loop through the list of stored pending reports and compare each to the result from the search. If a stored report isn't present in the search, we consider that to still be in the pending state. Other changes: - Improvements the class responsible for managing updates to the stored list of pending reports, so that each operation performs atomically and go through a queue to mitigate the chance of race conditions. - Update function names and variable names for readability - Remove the unused `removeJob` function. - Move helper code into private methods of the `StreamHandler` class and simplify `x-pack/plugins/reporting/public/plugin.ts` ## Release note Fixed an issue in Reporting with consistently showing the toast message for completed report jobs. --------- Co-authored-by: Kibana Machine <[email protected]>

elasticmachine · 2024-02-28T07:52:16Z

Pinging @elastic/appex-sharedux (Team:SharedUX)

sebelga

Code review only LGTM 👍

Needs followup, the data stream configuration set in ES must a data stream lifecycle policy

Do we need to wait for that first before merging this PR?

packages/kbn-reporting/common/constants.ts

sebelga · 2024-03-04T11:20:15Z

x-pack/test/reporting_api_integration/reporting_without_security/job_apis_csv.ts

+      expect(job.payload.title).equal('A Saved Search With a DATE FILTER');
+
+      // wait for index refresh
+      await sleep(3000);


I wonder how risky this is to introduce flakiness. Wouldn't it be better to pass a param to postJobCSV({ refreshIndex: true }) and avoid the sleep call?

Good point!

Passing many test runs with the flaky test runner is definitely a requirement for this PR.

Calling for a refresh on the index whenever a job is posted seems expensive enough to merit seconds thoughts.

The amount of time to sleep should be derived from the refresh interval for data streams, which I think is 3 seconds (this is integration test is for stateful). However, in serverless it is 15 seconds. I will confirm these things in a sync with @elastic/es-data-management

Calling for a refresh on the index whenever a job is posted seems expensive enough to merit seconds thoughts

I meant to only set refreshIndex: true for api integration tests to ensure consistency and not rely on a time value that can change at any moment.

(disclaimer about not really understanding the typescript code)

Could this use the wait_for option when indexing things so that you only wait for as long as is necessary for the refresh to happen naturally? I.e.: https://www.elastic.co/guide/en/elasticsearch/reference/8.12/docs-refresh.html#docs-refresh

@sebelga I think I understand now. But, I'm very hesitant to add a flag in the API that would only serve the purpose of tests.

@dakrone I will try your suggestion to use wait_for, and see if that allows for removing or lowering the sleep call.

If the sleep is needed, my goal will to make sure it is set to a calculated amount of time. It should be low enough to wait for ES and avoiding an understood race condition. It shouldn't be perceived as something to address flakiness.

But, I'm very hesitant to add a flag in the API that would only serve the purpose of tests

I see. My point on this is that we shouldn't have to test ES behaviour (in this case that it does refresh the index "at some point") and add some timeout for it. We should assume ES will do it. It's like mocking a debounce in jest, I know it will eventually execute the debounced function.

I know it's a trade off, I understand where you are coming from, but I think it is worth it if (1) it prevents flackiness (2) it reduce time to execute the tests. Just my 2 cents 😊

I used the suggestion to change to refresh=wait_for in the operations to index new documents and update documents, and it seems to resolved this.

## Summary In previous code, Reporting fetched for a list of pending reports and compared the status of each result to a stored record of pending reports. The flakiness occurred because sometimes the results don't contain a record for a very "fresh" report since the refresh interval hadn't ticked and the stored change wasn't reflecting in search results. With [switching Reporting to use a data stream](elastic#176022), there was new flakiness in serverless tests. The area of flakiness was in getting Reporting to show the "Job completed" toast notification. The main difference faced with data streams is a lower index refresh interval, which is lower even more for serverless. This PR fixes the poller to loop through the list of stored pending reports and compare each to the result from the search. If a stored report isn't present in the search, we consider that to still be in the pending state. Other changes: - Improvements the class responsible for managing updates to the stored list of pending reports, so that each operation performs atomically and go through a queue to mitigate the chance of race conditions. - Update function names and variable names for readability - Remove the unused `removeJob` function. - Move helper code into private methods of the `StreamHandler` class and simplify `x-pack/plugins/reporting/public/plugin.ts` ## Release note Fixed an issue in Reporting with consistently showing the toast message for completed report jobs. --------- Co-authored-by: Kibana Machine <[email protected]>

tsullivan · 2024-05-15T21:23:19Z

x-pack/plugins/reporting/server/lib/store/ilm_policy_manager/ilm_policy_manager.ts

+   * This method is automatically called on the Stack Management > Reporting page, by the `` API for users with
+   * privilege to manage ILM, to notify them when attention is needed to update the policy for any reason.
+   */
+  public async checkIlmMigrationStatus(): Promise<IlmPolicyMigrationStatus> {


Relocated this code from x-pack/plugins/reporting/server/lib/deprecations/check_ilm_migration_status.ts

tsullivan · 2024-05-15T21:24:24Z

x-pack/plugins/reporting/server/lib/store/store.ts

-    return IlmPolicyManager.create({ client });
-  }
-
-  private async createIndex(indexName: string) {


No need to create the index or manually apply index settings in Reporting! That is handled in ES now.

tsullivan · 2024-05-15T21:29:22Z

/ci

kibanamachine · 2024-05-15T23:12:26Z

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#5999

[✅] x-pack/test/functional/apps/discover/config.ts: 33/33 tests passed.

see run history

jughosta

Data Discovery changes LGTM 👍
Great work!

kibana-ci · 2024-05-21T00:02:09Z

💚 Build Succeeded

Buildkite Build
Commit: 8f1d3d7

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`@kbn/reporting-server`	87	91	+4

Unknown metric groups

API count

id	before	after	diff
`@kbn/reporting-server`	88	92	+4

History

💚 Build #210317 succeeded 7204d46
💚 Build #210289 succeeded 54b2239a567b44a6f98d854bf4cb7b7aa5c15457
💚 Build #210254 succeeded c331dc4cfac79d665ead09e7a1755090d72b8fe6
💔 Build #210016 failed cebb36f7d59796362c9fa579eebbee31c4ae3e35
💔 Build #210009 failed a160fe715a4490fc508b1c261b7182b829d73044
💔 Build #209891 failed 9302330eafed4dd546e815e1a681f64e628b3bea

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

rshen91

I tested locally in both on prem and serverless. LGTM!!

tsullivan force-pushed the reporting/data-stream-ii branch from 9893121 to 7345ea1 Compare January 31, 2024 22:12

tsullivan mentioned this pull request Jan 31, 2024

[Reporting] Use a data stream for storage #161606

Closed

3 tasks

tsullivan force-pushed the reporting/data-stream-ii branch 2 times, most recently from ad926b1 to 55d32f8 Compare February 1, 2024 00:46

elastic deleted a comment from kibana-ci Feb 1, 2024

tsullivan force-pushed the reporting/data-stream-ii branch 2 times, most recently from 0801774 to 2cc94a1 Compare February 13, 2024 19:38

tsullivan force-pushed the reporting/data-stream-ii branch 3 times, most recently from ae03a84 to 35c76e6 Compare February 22, 2024 19:22

tsullivan mentioned this pull request Feb 22, 2024

[Reporting] Fix job notifications poller #177537

Merged

tsullivan force-pushed the reporting/data-stream-ii branch 4 times, most recently from e0a657f to a491ebc Compare February 27, 2024 22:48

tsullivan force-pushed the reporting/data-stream-ii branch from a491ebc to c40121d Compare February 28, 2024 06:27

tsullivan marked this pull request as ready for review February 28, 2024 06:27

tsullivan requested a review from a team as a code owner February 28, 2024 06:27

tsullivan added release_note:enhancement (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) labels Feb 28, 2024

sebelga reviewed Mar 4, 2024

View reviewed changes

tsullivan marked this pull request as draft March 14, 2024 22:46

This was referenced Mar 14, 2024

[Reporting/storage] set refresh=wait_for in indexing and update calls #178783

Merged

Reporting: organize constant variables into common/server packages #178784

Merged

tsullivan added 11 commits May 15, 2024 14:19

Fix issue: policy would not be linked if was created in previous startup

d3e199d

Fix migration: link template and update settings of backing indices

d7848c7

Improve logging

7914807

update syntax

39a5a0c

add index_timestamp

161e45a

update tsconfig

70436bd

add index_timestamp

7735d61

update ContentStream write operation

9142cf6

Increase timeout for max size reached test

ca209af

fix contentstream jest tests

ad59e7f

add integration test for datastream

7204d46

tsullivan force-pushed the reporting/data-stream-ii branch from 54b2239 to 7204d46 Compare May 15, 2024 21:21

tsullivan commented May 15, 2024

View reviewed changes

tsullivan marked this pull request as ready for review May 15, 2024 21:24

tsullivan requested a review from a team as a code owner May 15, 2024 21:24

jughosta approved these changes May 16, 2024

View reviewed changes

Merge branch 'main' into reporting/data-stream-ii

8f1d3d7

rshen91 approved these changes May 21, 2024

View reviewed changes

tsullivan merged commit 56383cc into elastic:main May 21, 2024
18 checks passed

tsullivan deleted the reporting/data-stream-ii branch May 21, 2024 17:12

kibanamachine added v8.15.0 backport:skip This commit does not require backporting labels May 21, 2024

tsullivan mentioned this pull request May 21, 2024

[Reporting] Define migration and testing strategy for reporting data stream #162198

Closed

tsullivan mentioned this pull request Aug 1, 2024

Reporting index managed by ILM policy got exception when rollover action is added in the default policy kibana-reporting #123384

Open

tsullivan mentioned this pull request Nov 21, 2024

Reporting feature does not work when setting stack.templates.enabled: false in Elasticsearch #201281

Open

Dosant mentioned this pull request Dec 19, 2024

[Reporting] don’t wait for refresh when uploading a chunk #204775

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Data Stream for Reporting storage #176022

Use Data Stream for Reporting storage #176022

tsullivan commented Jan 31, 2024 •

edited

Loading

github-actions bot commented Jan 31, 2024

elasticmachine commented Feb 28, 2024

sebelga left a comment

sebelga Mar 4, 2024

tsullivan Mar 4, 2024

sebelga Mar 5, 2024

dakrone Mar 5, 2024

tsullivan Mar 7, 2024

sebelga Mar 12, 2024

tsullivan Mar 20, 2024

tsullivan May 15, 2024

tsullivan May 15, 2024

tsullivan commented May 15, 2024

kibanamachine commented May 15, 2024

jughosta left a comment

kibana-ci commented May 21, 2024

API count

rshen91 left a comment

Use Data Stream for Reporting storage #176022

Use Data Stream for Reporting storage #176022

Conversation

tsullivan commented Jan 31, 2024 • edited Loading

Summary

Release Note

Screenshots

Upgrade test (manual process)

Log lines

Checklist

github-actions bot commented Jan 31, 2024

elasticmachine commented Feb 28, 2024

sebelga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsullivan commented May 15, 2024

kibanamachine commented May 15, 2024

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#5999

jughosta left a comment

Choose a reason for hiding this comment

kibana-ci commented May 21, 2024

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

API count

History

rshen91 left a comment

Choose a reason for hiding this comment

tsullivan commented Jan 31, 2024 •

edited

Loading