-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ObsUX] Make Metrics data sources within APM transparent to avoid confusion with overlapping metrics in the UI #170632
Comments
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services) |
@smith - do you think this is something that would fit in the team backlog or do you think this needs a project to be prioritised to try and improve this? |
Let's keep this to do a short term solution to explain to the user why the data might be different as @MiriamAparicio described above. |
@MiriamAparicio - thanks for raising this, a very good problem description and next steps. I hope you don't mind but I renamed it slightly to reflect that we'll try and focus on your first suggestion - making it clear what each metric really means/where it comes from. I also added in a draft Acceptance Criteria. My hope is that the solution/ACE can be figured out during refinement if that works? @smith - OK? |
I don't understand why we would want to present the user with two different values for memory and cpu. Are there any good reasons for them to be different, other than they were captured through different means? If so, what are they? If we can clearly articulate the difference and when one would need to use one over the other, I can somewhat understand why we'd have both. If not I suggest we should use the metricbeat value, and use the APM agent value as fallback. |
Hey @sqren, you're right - there isn't a need for them to be different from a user POV. My main thinking here was whether we can really solve for this without significant work that we likely can't prioritise right now. Having said that, if you can think of a way to elegantly handle this without a lot of work - I'm happy for us to spend some time refining this to try. I do like your idea, it's pretty smart. I do have a concern but let me check I understand first. To recap your suggestion:
My concern would be what happens if some of the hosts run metricbeat and some don't - what do we show in the 'metrics' tab? |
Yeah, my thinking is that we first fetch the metric (cpu, memory) from the infra indices. If that doesn't yield any results we fetch from the apm indices. We can start doing this from within the APM app (we already have data clients to access infra and apm indices). The better solution would be to have this encapsulated somewhere (OAM?) so that we can just call a function |
Yes, good point. I suggest that if we detect any metricbeat data for the selected service, we use that for all hosts. I think we should treat it as a configuration error if the customer has a service running across multiple hosts, and some but not all are running metricbeat. |
Hey @sqren, I like your thinking here...I think I got ahead of myself with the acceptance criteria here. What do you think about me just deleting the acceptance criteria for now and you/the team/me would have time to think of possibilities during refinement? That way, you have the freedom to propose some solutions and the acceptance criteria would be based on that? |
@roshan-elastic SGTM 👍 |
Since APM data is inconsistent, wouldn't it make more sense to prompt users to install metricbeat or deploy an agent to those hosts? |
Also, the inconsistency will be evident when we integrate the Asset Details flyout in the Infra table?! |
My intention was that if the user has metricbeat running for some hosts but not all, the hosts without metricbeat will not show up at all. We should only fall back to APM data, if there are no hosts with metricbeat data. We can improve this down the line by letting the user know that we have discovered hosts that do not have metricbeat - this should also take into account hosts discovered via other means than APM agents (eg filebeat). |
Playing this back for my understanding, for the 'infrastructure' and 'metrics' tabs in APM:
Thoughts My worry is that once a user has at least 1 APM-detected host that runs metricbeat/agent, will they lose all of the metrics for the hosts which they previously had via APM-detected hosts but now are being excluded? Idea... e.g. as soon as we go to option (3), we still show the APM data but flag it, show them how to filter it out and also provide instructions on how to onboard them with elastic agent/metricbeat? More complexity...Containers vs Hosts I'm not sure how this plays into the handling of everything... I'm thinking a list of potential use cases would be quite helpful so we could map out what would happen?
|
Wouldn't discarding hosts, as proposed in option 3, cause more confusion than solving the issue?
We also need to consider that we will soon integrate the asset details flyout into the Infrastructure table. So what we're discussing here will solve the mismatches in APM UI, but the problem will still exist in Infra UIs. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
We're fixing this with an entity-based view, so closing this issue. |
Just curious: if we have to different CPU values for the same host, how will the entity model solve the problem of deciding which value to use? |
Description of the problem
The Metric tab is populating the metrics charts data (i.e. memory usage (avg)) from APM agent whilst Infrastructure tab shows a table of metrics populated by metricbeat, this is confusing for customers
Possible solutions
(to be discussed)
Related issues
[Infrastructure Observability] Infrastructure metrics data should pull from APM if no agent/beat data is available
✔️ Acceptance criteria
1. Must Have
Must be delivered in this issue in order for the release to be valuable
2. Should Have
3. Could Have
Would be nice to have but not critical
4. Will Not Have (for now)
Explicitly will not be looked at within this issue
The text was updated successfully, but these errors were encountered: