Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Synthetics] Fix issue where heatmap UI crashes on undefined histogram data #192508

Merged
merged 7 commits into from
Sep 17, 2024

Conversation

justinkambic
Copy link
Contributor

@justinkambic justinkambic commented Sep 10, 2024

Summary

Recently, we fixed an issue where the heatmap data on the detail and monitor history pages would not fill up.

A side effect of this fix was a new regression that caused certain rarer cases to see the page crash because of an unhandled case of calling a function on a potentially-null object; our histogram data used to populate this heatmap can be undefined in certain cases.

This patch introduces a change that will handle this case, and adds unit tests for the module in question.

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space. Low High Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. High Low Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled. Medium High Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

@justinkambic justinkambic added bug Fixes for quality problems that affect the customer experience release_note:fix Team:obs-ux-management Observability Management User Experience Team v8.16.0 labels Sep 10, 2024
@justinkambic justinkambic self-assigned this Sep 10, 2024
@justinkambic justinkambic requested a review from a team as a code owner September 10, 2024 18:21
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@botelastic botelastic bot added the ci:project-deploy-observability Create an Observability project label Sep 10, 2024
@obltmachine
Copy link

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Contributor

@shahzad31 shahzad31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !!

@kibana-ci
Copy link
Collaborator

kibana-ci commented Sep 17, 2024

💛 Build succeeded, but was flaky

  • Buildkite Build
  • Commit: 9884931
  • Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-192508-9884931b2c5d

Failed CI Steps

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
synthetics 963.4KB 963.5KB +58.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @justinkambic

@justinkambic justinkambic merged commit da2f7f6 into elastic:main Sep 17, 2024
26 checks passed
@justinkambic justinkambic deleted the fix-heatmap-again branch September 17, 2024 12:08
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 17, 2024
…m data (elastic#192508)

## Summary

Recently, [we fixed](elastic#184177) an
[issue](elastic#180076) where the
heatmap data on the detail and monitor history pages would not fill up.

A side effect of this fix was a new regression that caused certain rarer
cases to see the page crash because of an unhandled case of calling a
function on a potentially-null object; our histogram data used to
populate this heatmap can be `undefined` in certain cases.

This patch introduces a change that will handle this case, and adds unit
tests for the module in question.

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)

### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces—unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes—Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |

### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Shahzad <[email protected]>
(cherry picked from commit da2f7f6)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Sep 17, 2024
…stogram data (#192508) (#193154)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Synthetics] Fix issue where heatmap UI crashes on undefined
histogram data (#192508)](#192508)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Justin
Kambic","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-09-17T12:08:51Z","message":"[Synthetics]
Fix issue where heatmap UI crashes on undefined histogram data
(#192508)\n\n## Summary\r\n\r\nRecently, [we
fixed](#184177)
an\r\n[issue](#180076) where
the\r\nheatmap data on the detail and monitor history pages would not
fill up.\r\n\r\nA side effect of this fix was a new regression that
caused certain rarer\r\ncases to see the page crash because of an
unhandled case of calling a\r\nfunction on a potentially-null object;
our histogram data used to\r\npopulate this heatmap can be `undefined`
in certain cases.\r\n\r\nThis patch introduces a change that will handle
this case, and adds unit\r\ntests for the module in
question.\r\n\r\n\r\n### Checklist\r\n\r\nDelete any items that are not
applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [ ] Any UI touched in this PR is
usable by keyboard only (learn more\r\nabout [keyboard
accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI
touched in this PR does not create any new axe failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risk Matrix\r\n\r\nDelete this section if it is not applicable to this
PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other
developers to\r\nidentify risks that should be tested prior to the
change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider
some of the following examples\r\nand how they may potentially impact
the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes
|\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n|
Multiple Spaces&mdash;unexpected behavior in non-default Kibana
Space.\r\n| Low | High | Integration tests will verify that all features
are still\r\nsupported in non-default Kibana Space and when user
switches between\r\nspaces. |\r\n| Multiple nodes&mdash;Elasticsearch
polling might have race conditions\r\nwhen multiple Kibana nodes are
polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so
executing them multiple times will not result\r\nin logical error, but
will degrade performance. To test for this case we\r\nadd plenty of unit
tests around this logic and document manual testing\r\nprocedure. |\r\n|
Code should gracefully handle cases when feature X or plugin Y
are\r\ndisabled. | Medium | High | Unit tests will verify that any
feature flag\r\nor plugin combination still results in our service
operational. |\r\n| [See more potential
risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
|\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Shahzad
<[email protected]>","sha":"da2f7f6dae265e29713e5c6586c542499b03bcd6","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","release_note:fix","v9.0.0","ci:project-deploy-observability","Team:obs-ux-management","v8.16.0"],"title":"[Synthetics]
Fix issue where heatmap UI crashes on undefined histogram
data","number":192508,"url":"https://github.com/elastic/kibana/pull/192508","mergeCommit":{"message":"[Synthetics]
Fix issue where heatmap UI crashes on undefined histogram data
(#192508)\n\n## Summary\r\n\r\nRecently, [we
fixed](#184177)
an\r\n[issue](#180076) where
the\r\nheatmap data on the detail and monitor history pages would not
fill up.\r\n\r\nA side effect of this fix was a new regression that
caused certain rarer\r\ncases to see the page crash because of an
unhandled case of calling a\r\nfunction on a potentially-null object;
our histogram data used to\r\npopulate this heatmap can be `undefined`
in certain cases.\r\n\r\nThis patch introduces a change that will handle
this case, and adds unit\r\ntests for the module in
question.\r\n\r\n\r\n### Checklist\r\n\r\nDelete any items that are not
applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [ ] Any UI touched in this PR is
usable by keyboard only (learn more\r\nabout [keyboard
accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI
touched in this PR does not create any new axe failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risk Matrix\r\n\r\nDelete this section if it is not applicable to this
PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other
developers to\r\nidentify risks that should be tested prior to the
change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider
some of the following examples\r\nand how they may potentially impact
the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes
|\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n|
Multiple Spaces&mdash;unexpected behavior in non-default Kibana
Space.\r\n| Low | High | Integration tests will verify that all features
are still\r\nsupported in non-default Kibana Space and when user
switches between\r\nspaces. |\r\n| Multiple nodes&mdash;Elasticsearch
polling might have race conditions\r\nwhen multiple Kibana nodes are
polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so
executing them multiple times will not result\r\nin logical error, but
will degrade performance. To test for this case we\r\nadd plenty of unit
tests around this logic and document manual testing\r\nprocedure. |\r\n|
Code should gracefully handle cases when feature X or plugin Y
are\r\ndisabled. | Medium | High | Unit tests will verify that any
feature flag\r\nor plugin combination still results in our service
operational. |\r\n| [See more potential
risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
|\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Shahzad
<[email protected]>","sha":"da2f7f6dae265e29713e5c6586c542499b03bcd6"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/192508","number":192508,"mergeCommit":{"message":"[Synthetics]
Fix issue where heatmap UI crashes on undefined histogram data
(#192508)\n\n## Summary\r\n\r\nRecently, [we
fixed](#184177)
an\r\n[issue](#180076) where
the\r\nheatmap data on the detail and monitor history pages would not
fill up.\r\n\r\nA side effect of this fix was a new regression that
caused certain rarer\r\ncases to see the page crash because of an
unhandled case of calling a\r\nfunction on a potentially-null object;
our histogram data used to\r\npopulate this heatmap can be `undefined`
in certain cases.\r\n\r\nThis patch introduces a change that will handle
this case, and adds unit\r\ntests for the module in
question.\r\n\r\n\r\n### Checklist\r\n\r\nDelete any items that are not
applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [ ] Any UI touched in this PR is
usable by keyboard only (learn more\r\nabout [keyboard
accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI
touched in this PR does not create any new axe failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risk Matrix\r\n\r\nDelete this section if it is not applicable to this
PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other
developers to\r\nidentify risks that should be tested prior to the
change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider
some of the following examples\r\nand how they may potentially impact
the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes
|\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n|
Multiple Spaces&mdash;unexpected behavior in non-default Kibana
Space.\r\n| Low | High | Integration tests will verify that all features
are still\r\nsupported in non-default Kibana Space and when user
switches between\r\nspaces. |\r\n| Multiple nodes&mdash;Elasticsearch
polling might have race conditions\r\nwhen multiple Kibana nodes are
polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so
executing them multiple times will not result\r\nin logical error, but
will degrade performance. To test for this case we\r\nadd plenty of unit
tests around this logic and document manual testing\r\nprocedure. |\r\n|
Code should gracefully handle cases when feature X or plugin Y
are\r\ndisabled. | Medium | High | Unit tests will verify that any
feature flag\r\nor plugin combination still results in our service
operational. |\r\n| [See more potential
risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
|\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Shahzad
<[email protected]>","sha":"da2f7f6dae265e29713e5c6586c542499b03bcd6"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Justin Kambic <[email protected]>
justinkambic added a commit that referenced this pull request Oct 8, 2024
## Summary

We addressed #180076 recently
with these two PRs:

- #184177
- #192508

We were seeing a strange error that was difficult to repro, so we put in
a best-effort patch that was still ineffective. The reason this issue
happens is because in the code it's possible to divide by 0, which
yields a value of `Infinity`, which at some point causes our interval
value supplied to the server route to be an empty string.

This patch will make it so that we never pass a value of 0 to be used in
the calculation of bucket sizes in this hook.
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Oct 8, 2024
## Summary

We addressed elastic#180076 recently
with these two PRs:

- elastic#184177
- elastic#192508

We were seeing a strange error that was difficult to repro, so we put in
a best-effort patch that was still ineffective. The reason this issue
happens is because in the code it's possible to divide by 0, which
yields a value of `Infinity`, which at some point causes our interval
value supplied to the server route to be an empty string.

This patch will make it so that we never pass a value of 0 to be used in
the calculation of bucket sizes in this hook.

(cherry picked from commit 560d561)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience ci:project-deploy-observability Create an Observability project release_note:fix Team:obs-ux-management Observability Management User Experience Team v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants