Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Field Statistics not honoring negated filters #170472

Closed
eriroley opened this issue Nov 2, 2023 · 13 comments
Closed

[ML] Field Statistics not honoring negated filters #170472

eriroley opened this issue Nov 2, 2023 · 13 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:File and Index Data Viz ML file and index data visualizer feedback_needed :ml Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. v8.13.0

Comments

@eriroley
Copy link

eriroley commented Nov 2, 2023

Kibana version:
8.10.4

Elasticsearch version:
8.10.4

Server OS version:
Ubuntu 22.04 LTS

Browser version:
Chrome 118

Browser OS version:
Windows 11

Original install method (e.g. download page, yum, from source, etc.):
Repo - https://artifacts.elastic.co/packages/8.x/apt stable main

Describe the bug:
Field Statistics doesn't honor negative filters

Steps to reproduce:

  1. Create query in discover using a negated filter
  2. switch from documents to "Field Statistics"
  3. number of documents exceeds number of hits - shows fields statistics that should have been excluded from search

Expected behavior:
number of documents equals number of hits - excluded documents not included in statistics

Screenshots (if relevant):
Note the inclusion of "administrator" in both the negative filter and the top values, and the number of documents exceeding the number of hits
Screenshot 2023-10-31 103619

@eriroley eriroley added the bug Fixes for quality problems that affect the customer experience label Nov 2, 2023
@botelastic botelastic bot added the needs-team Issues missing a team label label Nov 2, 2023
@dmlemeshko dmlemeshko added the Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. label Nov 3, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Nov 3, 2023
@kertal
Copy link
Member

kertal commented Nov 7, 2023

So far I couldn't reproduce , neither in 8.10.4 nor in main.

Image

I've a question, when reloading the page of your screenshot, so field statistics is displayed and the same filter is applied, is the result also false? How much values are you filtering out? Thx

@eriroley
Copy link
Author

Sorry about the delay in responding - a reload of the page doesn't change things

I can replicate this with just one NOT, but the full query I am using is
NOT source.user.name: is one of shawn, walter, forti, administrator, fortiadmin, storage, super_admin, fortinet, jerry, Fortinet, superadmin, perry, user, jimmy, bruce, jaime, arthur, mark, christopher, anna, admin, angie, archive, ricoh, soporte, marketing, temp, warehouse, paul, margaret, edu, admin123, scanner, info, husson, kiosk, philip, craig, test123, ron, root, backups, shop, test, sysadmin, user1, backup, scan, james, user2, training, george, parent, sage, joe, sa, office, commercial, adm, fortik, charlie, victor, jeff, michael, jason, emma, conference, gloria, canon, dispatch, kathie, eddie, guest, thomas, FORTINET, alice, christine, juliana, scanning, security, class, rose, valentina, production, accounting, ecommerce, nicole, test1, andrew, karen, ashley, gabriela, engineer, tiffany, report, teresa, mila, katrina, lisa, vpn, frontdesk, janet, project, payroll, demo, caroline, leona, fortiuser, stephanie, miranda, alison, molly, christina, vicki, hr, sandra, jane, quality, cheryl, data, john, sharepoint, stephen, Admin, roger, accounts, chrysta, melody, sabrina, beatrice, tracey, loretta, lauren, lucy, angela, vera, beverly, debra, daisy, fax, deborah, isabel, 863, frank, stella, finance, luna, anca, david

Frank

@eriroley
Copy link
Author

indeed - if I then explicitly exclude one of the specific names that is showing up in source.user.name in this case the top user) and then refresh the page, it still shows up in the top values (I did mask two valid usernames that I am negating in the screenshot)
Screenshot 2023-11-13 103612

@jughosta
Copy link
Contributor

What top values are visible in a popover when source.user.name is pressed in the left sidebar?

@eriroley
Copy link
Author

The popover on the left shows what is expected, and shows that is is calculated from the correct (matching) number of records
see screenshot attached
Screenshot 2023-11-13 115244

@qn895 qn895 self-assigned this Nov 15, 2023
@qn895
Copy link
Member

qn895 commented Nov 15, 2023

Thanks Frank @eriroley for documenting this issue. I suspect the multifields is what's causing your issue. To verify if I'm on the right track, do you mind doing the following?

In the popover for source.user.name, when you hover over the source.user.name.text, does it show a + or - ? If it shows a +, can you click on to add the field to the column? Once clicked, you should see source.user.name.text in the list of Selected fields.
image

Now, click to open that popover for source.user.name.text again, this time click on the - on itsupport. Does that correctly update the Field stats table?

What happen if you first include or exclude the terms directly from the Field stats table. Can you click on one of these buttons and share a screenshot of the filter bar? If it works, this could be a potential workaround (for now while we investigate the issue). This should also update the global filter for the rest of Discover.

image

@eriroley
Copy link
Author

interestingly, we just did the update to 8.11.1 today, and this seems to have resolved it

@kertal
Copy link
Member

kertal commented Nov 22, 2023

@eriroley great to hear, so I guess we can close this? thx!

@kertal
Copy link
Member

kertal commented Nov 28, 2023

Closing for new, feel free to reopen if it's encountered again. Thx for reporting

@kertal kertal closed this as completed Nov 28, 2023
@jughosta
Copy link
Contributor

I was able to reproduce it on a cloud instance with logs sample data. It seems to be happening when no specific field is specified in the query input. A quick look at the component props shows that Discover passes current query and filters to Field Statistics embeddable.

Screenshot 2024-01-25 at 10 36 49 Screenshot 2024-01-25 at 10 48 32

cc @qn895

@jughosta jughosta reopened this Jan 25, 2024
@jughosta jughosta added the :ml label Jan 25, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@peteharverson peteharverson added the Feature:File and Index Data Viz ML file and index data visualizer label Jan 25, 2024
@peteharverson peteharverson changed the title Field Statistics not honoring negated filters [ML] Field Statistics not honoring negated filters Jan 25, 2024
qn895 added a commit that referenced this issue Feb 8, 2024
…Data Drift (#176347)

This PR removes usage of `createMergedEsQuery` in favor of buildEsQuery.
It also fixes an intermittent issue with filters
#170472 not being honored when
query is partial/of multi match type.
<img width="1728" alt="Screenshot 2024-02-07 at 10 20 26"
src="https://github.com/elastic/kibana/assets/43350163/a6ba78ef-4970-4d5b-8bdb-b74f807969e0">


It also improves adding/removing filter for empty values

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces&mdash;unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes&mdash;Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Kibana Machine <[email protected]>
@qn895
Copy link
Member

qn895 commented Feb 14, 2024

Closing this issue via #176347

@qn895 qn895 closed this as completed Feb 14, 2024
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this issue Feb 15, 2024
…Data Drift (elastic#176347)

This PR removes usage of `createMergedEsQuery` in favor of buildEsQuery.
It also fixes an intermittent issue with filters
elastic#170472 not being honored when
query is partial/of multi match type.
<img width="1728" alt="Screenshot 2024-02-07 at 10 20 26"
src="https://github.com/elastic/kibana/assets/43350163/a6ba78ef-4970-4d5b-8bdb-b74f807969e0">


It also improves adding/removing filter for empty values

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces&mdash;unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes&mdash;Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Kibana Machine <[email protected]>
fkanout pushed a commit to fkanout/kibana that referenced this issue Mar 4, 2024
…Data Drift (elastic#176347)

This PR removes usage of `createMergedEsQuery` in favor of buildEsQuery.
It also fixes an intermittent issue with filters
elastic#170472 not being honored when
query is partial/of multi match type.
<img width="1728" alt="Screenshot 2024-02-07 at 10 20 26"
src="https://github.com/elastic/kibana/assets/43350163/a6ba78ef-4970-4d5b-8bdb-b74f807969e0">


It also improves adding/removing filter for empty values

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces&mdash;unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes&mdash;Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Kibana Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:File and Index Data Viz ML file and index data visualizer feedback_needed :ml Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. v8.13.0
Projects
None yet
Development

No branches or pull requests

7 participants