Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Use field caps option include_empty_fields to identify populated fields. #178606

Open
5 of 8 tasks
Tracked by #201131
walterra opened this issue Mar 13, 2024 · 1 comment
Open
5 of 8 tasks
Tracked by #201131
Assignees
Labels
Meta :ml technical debt Improvement of the software architecture and operational architecture v8.18.0

Comments

@walterra
Copy link
Contributor

walterra commented Mar 13, 2024

As of elastic/elasticsearch#103651 there is a new field caps option include_empty_fields. Discover is making use of this already: #174063

We have various places where we use custom code to identify populated fields of an index by getting a random sample of docs and then check which fields are populated. These queries use random_score which can be a heavy query. We should migrate to the new field caps option which will be available as of 8.13.

plugins/ml

  • x-pack/plugins/ml/public/application/data_frame_analytics/pages/analytics_creation/hooks/use_index_data.ts
    Code that identifies populated fields for data grid.
  • x-pack/plugins/ml/public/application/components/field_stats_flyout/populated_fields/get_merged_populated_fields_query.ts

plugins/aiops

plugins/transform

  • x-pack/plugins/transform/public/app/hooks/use_index_data.ts
    Code that identifies populated fields for data grid. [ML] Transforms: Improve data grid memoization. #195394 (8.16)
  • transforms get populated fields via field stats which still needs to be updated to use include_empty_fields.

plugins/data_visualizer

plugins/apm

@walterra walterra added :ml technical debt Improvement of the software architecture and operational architecture labels Mar 13, 2024
@walterra walterra self-assigned this Mar 13, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@walterra walterra changed the title [ML] Use `https://github.com/elastic/elasticsearch/pull/103651 [ML] Use field caps options include_fields_with_no_value to identify populated fields. Mar 13, 2024
@walterra walterra changed the title [ML] Use field caps options include_fields_with_no_value to identify populated fields. [ML] Use field caps option include_fields_with_no_value to identify populated fields. Mar 14, 2024
@walterra walterra changed the title [ML] Use field caps option include_fields_with_no_value to identify populated fields. [ML] Use field caps option include_empty_fields to identify populated fields. Mar 14, 2024
@qn895 qn895 self-assigned this Mar 14, 2024
walterra added a commit that referenced this issue Mar 14, 2024
…d of custom query. (#178699)

## Summary

Part of #178606.

As of elastic/elasticsearch#103651 there is a
new field caps option `include_empty_fields`. This PR updates AIOps Log
Rate Analysis to make use of this option instead of a custom query and
code that identified populated fields.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5482
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
@qn895 qn895 removed their assignment May 9, 2024
@walterra walterra added v8.16.0 and removed v8.15.0 labels Jul 5, 2024
walterra added a commit that referenced this issue Oct 11, 2024
## Summary

Part of #178606 and #151664.

- Removes some unused code related to identifying populated index
fields.
- Changes `useIndexData()` to accept just one config options arg instead
of individual args.
- Improves data grid memoziation.

Improvements tested locally:

#### `many_fields` dataset (no timestamp)

- `main`: `~22s` and 10 data grid rerenders until many_fields data set
loaded. The transform config dropdown are hardly usable and super slow,
each edit causes 3 data grid rerenders.
- This PR: `~4.5s` and 7 data grid rerenders until many_fields data set
loaded. The transform config dropdowns are a bit slow but usable!

#### `kibana_sample_data_logs` dataset (whole dataset in the past to
test rerenders on load without data)

- `main`: 5 rerenders.
- This PR: 3 rerenders

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Oct 11, 2024
## Summary

Part of elastic#178606 and elastic#151664.

- Removes some unused code related to identifying populated index
fields.
- Changes `useIndexData()` to accept just one config options arg instead
of individual args.
- Improves data grid memoziation.

Improvements tested locally:

#### `many_fields` dataset (no timestamp)

- `main`: `~22s` and 10 data grid rerenders until many_fields data set
loaded. The transform config dropdown are hardly usable and super slow,
each edit causes 3 data grid rerenders.
- This PR: `~4.5s` and 7 data grid rerenders until many_fields data set
loaded. The transform config dropdowns are a bit slow but usable!

#### `kibana_sample_data_logs` dataset (whole dataset in the past to
test rerenders on load without data)

- `main`: 5 rerenders.
- This PR: 3 rerenders

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

(cherry picked from commit 869ceec)
kibanamachine added a commit that referenced this issue Oct 11, 2024
)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[ML] Transforms: Improve data grid memoization.
(#195394)](#195394)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Walter
Rafelsberger","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-11T18:18:11Z","message":"[ML]
Transforms: Improve data grid memoization. (#195394)\n\n##
Summary\r\n\r\nPart of #178606 and #151664.\r\n\r\n- Removes some unused
code related to identifying populated index\r\nfields.\r\n- Changes
`useIndexData()` to accept just one config options arg instead\r\nof
individual args.\r\n- Improves data grid
memoziation.\r\n\r\nImprovements tested locally:\r\n\r\n####
`many_fields` dataset (no timestamp)\r\n\r\n- `main`: `~22s` and 10 data
grid rerenders until many_fields data set\r\nloaded. The transform
config dropdown are hardly usable and super slow,\r\neach edit causes 3
data grid rerenders.\r\n- This PR: `~4.5s` and 7 data grid rerenders
until many_fields data set\r\nloaded. The transform config dropdowns are
a bit slow but usable!\r\n\r\n#### `kibana_sample_data_logs` dataset
(whole dataset in the past to\r\ntest rerenders on load without
data)\r\n\r\n- `main`: 5 rerenders.\r\n- This PR: 3 rerenders\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] This was checked for breaking
API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"869ceec5ca8a1156d077bb2a888a91ef73e30511","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":[":ml","release_note:skip","Feature:Transforms","v9.0.0","v8.16.0","backport:version"],"title":"[ML]
Transforms: Improve data grid
memoization.","number":195394,"url":"https://github.com/elastic/kibana/pull/195394","mergeCommit":{"message":"[ML]
Transforms: Improve data grid memoization. (#195394)\n\n##
Summary\r\n\r\nPart of #178606 and #151664.\r\n\r\n- Removes some unused
code related to identifying populated index\r\nfields.\r\n- Changes
`useIndexData()` to accept just one config options arg instead\r\nof
individual args.\r\n- Improves data grid
memoziation.\r\n\r\nImprovements tested locally:\r\n\r\n####
`many_fields` dataset (no timestamp)\r\n\r\n- `main`: `~22s` and 10 data
grid rerenders until many_fields data set\r\nloaded. The transform
config dropdown are hardly usable and super slow,\r\neach edit causes 3
data grid rerenders.\r\n- This PR: `~4.5s` and 7 data grid rerenders
until many_fields data set\r\nloaded. The transform config dropdowns are
a bit slow but usable!\r\n\r\n#### `kibana_sample_data_logs` dataset
(whole dataset in the past to\r\ntest rerenders on load without
data)\r\n\r\n- `main`: 5 rerenders.\r\n- This PR: 3 rerenders\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] This was checked for breaking
API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"869ceec5ca8a1156d077bb2a888a91ef73e30511"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/195394","number":195394,"mergeCommit":{"message":"[ML]
Transforms: Improve data grid memoization. (#195394)\n\n##
Summary\r\n\r\nPart of #178606 and #151664.\r\n\r\n- Removes some unused
code related to identifying populated index\r\nfields.\r\n- Changes
`useIndexData()` to accept just one config options arg instead\r\nof
individual args.\r\n- Improves data grid
memoziation.\r\n\r\nImprovements tested locally:\r\n\r\n####
`many_fields` dataset (no timestamp)\r\n\r\n- `main`: `~22s` and 10 data
grid rerenders until many_fields data set\r\nloaded. The transform
config dropdown are hardly usable and super slow,\r\neach edit causes 3
data grid rerenders.\r\n- This PR: `~4.5s` and 7 data grid rerenders
until many_fields data set\r\nloaded. The transform config dropdowns are
a bit slow but usable!\r\n\r\n#### `kibana_sample_data_logs` dataset
(whole dataset in the past to\r\ntest rerenders on load without
data)\r\n\r\n- `main`: 5 rerenders.\r\n- This PR: 3 rerenders\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] This was checked for breaking
API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"869ceec5ca8a1156d077bb2a888a91ef73e30511"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Walter Rafelsberger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta :ml technical debt Improvement of the software architecture and operational architecture v8.18.0
Projects
None yet
Development

No branches or pull requests

4 participants