-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[8.15] [Synthetics] Fix heatmap on monitor detail/history page for very large doc counts (#184177) #192270
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…e doc counts (elastic#184177) ## Summary Resolves elastic#180076. ### User-facing updates #### We can now display the heatmap for "very large" document counts ##### This PR _note: The first bucket in this image is empty because that's when my monitor actually started._ <img width="2083" alt="image" src="https://github.com/elastic/kibana/assets/18429259/72daf378-586e-4c7d-a1b0-a05bf95fb09a"> ##### Previously <img width="1420" alt="image" src="https://github.com/elastic/kibana/assets/18429259/94f46913-debc-4bce-9794-600241fc4d6f"> #### We display appropriate bucketed data This is a weird behavior I only noticed once I had made the new query. For some reason, the chart didn't display all the data it should. This may be related to the initial issue, but even in ranges less than 10k docs I was seeing this behavior. I manually checked this by querying for the bucket spans and ensuring the right number of docs were showing up, which is what I'd expect, because it's highly unlikely there would be a bug like this in the date histogram agg. ##### This PR <img width="1420" alt="image" src="https://github.com/elastic/kibana/assets/18429259/8620f709-dd4f-4a52-b39d-a3ff778ff7b4"> ##### Previously <img width="1420" alt="image" src="https://github.com/elastic/kibana/assets/18429259/745fa36d-db35-46a4-be1d-330bed799884"> ### Why are you doing this? We have noticed in a few SDH that in cases where users are querying very large document tallies, like a month's worth of data for a monitor that runs every minute, they're not seeing their status heatmap fill up. ~This patch would introduce a multi-stage query mechanism to the pings query endpoint. The PR is in draft state, so the procedure performs the query in this manner all the time, but we may want to add a new param to the endpoint that allows the client to opt into this behavior, or simply cap at the default 10k documents like it does today.~ ~I intend to do a good amount of benchmark testing on this fix locally and in Cloud before I try to merge it. We may also need to explore imposing a hard upper limit on the number of stages we are willing to execute. Right now the query is unbounded and a user could try to query datasets that would result in Kibana retrieving and trying to respond with hundreds of MB of data.~ ### What changed This PR modifies the backend query to rely on a Date Histogram agg, rather than querying the documents directly for this purpose. This agg uses the same interval that the client calculates based on the width of the page. I have kept the existing time bin logic on the client side, and modified the function that builds the data structure to pour the buckets from the histogram into the bins that we serve the chart that the user sees. One drawback is because the buckets are computed by Elasticsearch, we need to make API calls to retrieve more data, whereas on resizes previously, all the data was already present and would get re-bucketed on the client. I think this is acceptable because of the ability to handle larger datasets and given that the query should be very fast still. Additionally: - I have made additional modifications to the frontend logic to prevent unnecessary API calls. Before, the component would always make at least two calls on an initial load. I have also switched from `lodash` `throttle` to `useDebounce` from `react-use`, as this seems to more properly drop intermediate resize events where we would want to skip hitting the API. - I have also changed the render logic. We used to use a default value to build out an initial set of time bins. Unfortunately, with all these changes in place, this results in a weird behavior where we show an empty scale while waiting for data, and the bins then snap to the proper size once the element's width has been determined and the bins get scaled out properly. Instead of this, I skip rendering the chart until the initial width is available via a `ref` applied to the wrapper element. At that point, we send the init width to the hook and use that value for our initial query and bin calculations, so we only ever show bins on the charts that will correspond to the bucketed data we eventually display. - I added a progress indicator to the wrapper element so the user receives an indication that the data is being refreshed. - I added a quiet fetch similar to how the overview page works, so that after an initial render the chart will not have its data dropped until we have new data to replace it with. Along those lines, the hook now watches the page location and if it changes (i.e. to from History to Overview), state management will destroy the heatmap data so we don't have a flicker of other data during a new fetch. ## Testing this PR Unfortunately, to truly test this PR you'd need a monitor with over 10k documents, as that's the criteria specified in the initial issue. I did this by running a monitor over about 10 days on a 1-run-per-minute schedule. You could create a cloud deployment from this PR, or create dummy documents in some way (this can be hard to verify though). I am pretty confident this works and fixes the issue based on my real-world testing. --------- Co-authored-by: shahzad31 <[email protected]> (cherry picked from commit 50f8bd6) # Conflicts: # x-pack/plugins/observability_solution/synthetics/common/constants/synthetics/rest_api.ts # x-pack/plugins/observability_solution/synthetics/public/apps/synthetics/state/root_effect.ts # x-pack/plugins/observability_solution/synthetics/server/routes/index.ts # x-pack/plugins/observability_solution/synthetics/server/routes/pings/get_ping_statuses.ts
botelastic
bot
added
ci:project-deploy-observability
Create an Observability project
Team:obs-ux-infra_services
Observability Infrastructure & Services User Experience Team
labels
Sep 6, 2024
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services) |
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
💔 Build FailedFailed CI StepsHistoryTo update your PR or re-run it, just comment with: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport
ci:project-deploy-observability
Create an Observability project
Team:obs-ux-infra_services
Observability Infrastructure & Services User Experience Team
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This will backport the following commits from
main
to8.15
:Questions ?
Please refer to the Backport tool documentation