feat(#60): dashboards with CHT api express metrics #75

kennsippell · 2023-07-18T09:01:19Z

Left Column

Dashboard starts with an replication apdex score. Waiting on some UX research to set these values, but tentatively I've set the following thresholds.

User State	Threshold
Satisfied	<90 secs
Tolerating	90 - 180 secs
Frustrated	> 180 secs

Interesting constraint is that the threshold we monitor for needs to match one of these hardcoded buckets in CHT 4.3 API. So this does not seem particularly agile or easy to change on the fly or per-partner.

Error rate shows % of get-ids requests resulting in status code 400-599.

Right Column

Total number of successful and failing replications.
Rate of replications per second by endpoint (optional. considering removing this)
Replication latency is 50th percentile, 90th percentile, and max replication times

kennsippell · 2023-07-19T06:50:08Z

2nd dashboard is a techy one based on the prometheus-api-metrics shared dashboard with all widgets and settings related to apdex removed. I don't know what the apdex thresholds should be for general CHT traffic.

jkuester · 2023-07-20T18:51:23Z

@kennsippell should these dashboards be getting populated with data from the fake-cht? I am trying to test them locally by pulling them in on top of 60-express-metrics, but I am not seeing anything populated in the dashboards:

jkuester

What is the rational for adding these as brand new dashboards instead of just including them as rows on the CHT Admin Details dashboard? I ask this mostly as a general question in that I am not sure exactly how we should determine "what belongs on a dashboard". I just know from a users perspective, if I want to find data about how long it is taking the server to respond to requests, it is not immediately clear to me if I would find that data on CHT Admin Details, CHT Replication, or CHT API Dashboard. (c.c. @m5r I would be glad for your feedback on this question as well!)

On one hand it is nice to have all of the panels dependent on a particular scrape config or CHT version all grouped on one dashboard since that makes it easier to document the requirements of a particular dashboard. However, it feels like dependencies (particularly on CHT versions) are bound to evolve over time and things will get more complicated.

My current thinking is that our dashboard design should prioritize the best experience for folks running the latest CHT version (while remaining aware of how changes will impact folks running older versions). Basically, we should put the panels where it makes the most sense to have them (regardless of what CHT version they are dependent on). If folks with older CHT versions see blank panels that is OK (as long as we indicate in the panel description the minimum CHT version needed for the panel).

That being said, for this particular case I have not fully made up my mind on where the best place is for the panels to go. 🤔 Seems like if we keep these panels in their own dashboards (like you have them now), then we should plan to break the CHT Admin Details up into separate dashboards (Couch data, Outbound messaging data, etc). This could be done as needed in the future. Open to other ideas though!

grafana/provisioning/dashboards/CHT/cht_admin_api_express.json

jkuester · 2023-07-20T20:49:41Z

grafana/provisioning/dashboards/CHT/cht_admin_api_express.json

+    "templating": {
+      "list": [
+        {
+            "definition": "query_result(up{job=~\"cht\"})",


I wonder if it is worth filtering the instances here based on their version (so we only show instances >= 4.3?

Suggested change

"definition": "query_result(up{job=~\"cht\"})",

"definition": "query_result(cht_version{app!~"^([0-3]\\.)|(4\\.[0-2]\\.).*"})",

grafana/provisioning/dashboards/CHT/cht_admin_api_express.json

grafana/provisioning/dashboards/CHT/cht_partnerships_replication.json

grafana/provisioning/dashboards/CHT/cht_admin_api_express.json

kennsippell · 2023-07-21T16:15:46Z

Rationale for adding these into thier own dashboards

When I've worked in monitored environments previously, there were thousands of dashboards. I didn't know what they were all for, but I could come in and learn what I needed to learn. I could get things done. I personally preferred having dashboards which were tailored to specific scenarios/tasks, rather than a few dashboards which tried to do everything. That was for a very complex service though and I'm not sure if it is applicable - but it is my lens.

cht_partnerships_replication I'm hoping will be used by partnerships-level people and not CHT admins. See this.

cht_admin_api_express I suspect this targets a core dev user type and not a CHT admin. See this.

I'd be quite open to including endpoint performance on the existing "CHT Details" dashboards - that feels like an interesting level of detail to me for CHT admins. Maybe?

grafana/provisioning/dashboards/CHT/cht_admin_api_express.json

m5r · 2023-07-24T12:48:23Z

What is the rational for adding these as brand new dashboards

I personally preferred having dashboards which were tailored to specific scenarios/tasks

@kennsippell's take makes sense. I'm not strongly opinionated on either side of the question.

Going all in with a single mega-dashboard will probably result in meh performances because of having many charts displayed at once.
Splitting dashboards by persona (i.e. a dashboard tailored for app devs, another one for SREs, another one for CHT admins and so on...) would be nice to give a personalized view of what's happening with a CHT instance but there is bound to be some duplication between dashboards. I don't know if grafana configs can share reusable dashboard "components" but that could be a solution.
And finally, splitting dashboards by scenario would categorize dashboards neatly with nearly 0 duplicate dashboard config but a single person might need to open many dashboards side by side to get all the information they need.

We can't go wrong with either the first or the third solution, either way we can split dashboards or regroup them without too much overhead
cc @jkuester

jkuester · 2023-07-25T14:14:58Z

Thanks for all the great conversation here @m5r and @kennsippell! Maybe I was just behind the curve here, but I feel like thing are a lot more clear in my head now regarding organizational strategies for these dashboards!

It seems to me that if we try to keep dashboards focused on specific scenarios/tasks (Mokhtar's #3) we would still get pretty much all the benefits from Mokhtar's #2 case of splitting by persona (since a given persona would be focused on one or more scenarios/tasks). So, this seems like a good guiding design principal for us to use!

And, with that being said, given the extra context @kennsippell provided, I think it makes sense to keep these two as separate dashboards!

mrjones-plip · 2023-08-02T20:23:42Z

Deferring to the feedback from @jkuester & @m5r, so removing myself as a reviewer. Lemme know if you want me to jump back in!

kennsippell · 2023-08-11T07:53:40Z

@kennsippell should these dashboards be getting populated with data from the fake-cht?

The API dashboards are expected to work. But the replication dashboards would be empty with fake-cht. Not clear to me how to get replication dashboards working with fake-cht. I could add the endpoint, but what would ping the endpoint to generate the data? I guess it could be prom (?) If you have ideas I can pursue; but I tested with live CHT.

mrjones-plip · 2023-08-29T22:39:05Z

@jkuester - with CHT Core 4.3.0 released which includes the API express metrics, it'd be good to move this PR along so we can release a matching version Watchdog. Put this in yer queue when ya get a sec!

thanks

mrjones-plip · 2023-08-30T17:30:11Z

Ah - I see @kennsippell is out on holiday until Sept 5th, in case any next steps are blocked until his return!

jkuester

👍 Super excited to get these dashboards in!

grafana/provisioning/dashboards/CHT/cht_coredev_api_express.json

grafana/provisioning/dashboards/CHT/cht_partnerships_replication.json

Co-authored-by: Joshua Kuestersteffen <[email protected]>

…on.json Co-authored-by: Joshua Kuestersteffen <[email protected]>

…watchdog into 60-express-dashboards

kennsippell · 2023-09-19T05:59:36Z

Thanks for the review @jkuester!

Good times with fake-cht server now:

jkuester

LGTM!

jkuester · 2023-09-19T20:19:17Z

development/fake-cht/package.json

@@ -2,7 +2,7 @@
  "name": "fake-cht",
  "version": "1.0.0",
  "scripts": {
-    "start": "node src/index.js"
+    "start": "node --experimental-fetch src/index.js"


What's wrong with Node 18? 😆

grafana/provisioning/dashboards/CHT/cht_partnerships_replication.json

medic-ci · 2023-09-21T05:58:40Z

🎉 This PR is included in version 1.11.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

kennsippell added 2 commits July 18, 2023 01:48

Dashboard with replication metrics for partners

ad84b1d

Move apdex in left column

ae8cd1b

kennsippell changed the title ~~Dashboards with CHT API Express Metrics~~ feat(#60) - Dashboards with CHT API Express Metrics Jul 18, 2023

kennsippell changed the title ~~feat(#60) - Dashboards with CHT API Express Metrics~~ feat(#60): Dashboards with CHT API Express Metrics Jul 18, 2023

kennsippell mentioned this pull request Jul 18, 2023

feat(#8426): API express internals + endpoint monitoring via Prometheus medic/cht-core#8354

Merged

5 tasks

kennsippell changed the title ~~feat(#60): Dashboards with CHT API Express Metrics~~ feat(#60): dashboards with CHT api express metrics Jul 18, 2023

kennsippell added 2 commits July 18, 2023 23:45

Rename dashboard (require 4.3)

b9e1582

New dashboard for API (Node.js/Express)

3228e19

kennsippell marked this pull request as ready for review July 19, 2023 06:50

Remove count of replications. Rename replication rate

e4a3995

kennsippell requested review from jkuester and mrjones-plip July 19, 2023 18:53

jkuester reviewed Jul 21, 2023

View reviewed changes

grafana/provisioning/dashboards/CHT/cht_admin_api_express.json Outdated Show resolved Hide resolved

kennsippell commented Jul 21, 2023

View reviewed changes

grafana/provisioning/dashboards/CHT/cht_admin_api_express.json Outdated Show resolved Hide resolved

Replace $instance with $cht_instance

950454b

mrjones-plip removed their request for review August 2, 2023 20:23

kennsippell added 2 commits August 11, 2023 00:32

Code review feedback

49ce30b

Rename and add tags

4a5d2d9

kennsippell requested a review from jkuester August 11, 2023 07:53

Remove CoreDev tag

6f15537

jkuester requested changes Sep 8, 2023

View reviewed changes

kennsippell and others added 8 commits September 18, 2023 12:38

Update grafana/provisioning/dashboards/CHT/cht_coredev_api_express.json

3a2f963

Co-authored-by: Joshua Kuestersteffen <[email protected]>

Merge branch 'main' into 60-express-dashboards

cb85adc

Metric without prefix

168fe85

Update grafana/provisioning/dashboards/CHT/cht_partnerships_replicati…

35977f0

…on.json Co-authored-by: Joshua Kuestersteffen <[email protected]>

Update grafana/provisioning/dashboards/CHT/cht_partnerships_replicati…

ad1cd1e

…on.json Co-authored-by: Joshua Kuestersteffen <[email protected]>

Replace $instance with $cht_instance

d85f44f

Merge branch '60-express-dashboards' of https://github.com/medic/cht-…

c7a15b1

…watchdog into 60-express-dashboards

Add replication data to fake-cht

55de8e4

kennsippell requested a review from jkuester September 19, 2023 05:59

Interval adjustments + Response Latency rework

735760e

jkuester approved these changes Sep 19, 2023

View reviewed changes

Require node 18

8f584ec

kennsippell merged commit e57487d into main Sep 21, 2023

kennsippell deleted the 60-express-dashboards branch September 21, 2023 05:58

medic-ci added the released label Sep 21, 2023

mrjones-plip mentioned this pull request Nov 12, 2024

Ensure API Express metrics are using correct protocol (http or https) #128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#60): dashboards with CHT api express metrics #75

feat(#60): dashboards with CHT api express metrics #75

kennsippell commented Jul 18, 2023 •

edited

Loading

kennsippell commented Jul 19, 2023 •

edited

Loading

jkuester commented Jul 20, 2023

jkuester left a comment

jkuester Jul 20, 2023

kennsippell commented Jul 21, 2023 •

edited

Loading

m5r commented Jul 24, 2023 •

edited

Loading

jkuester commented Jul 25, 2023

mrjones-plip commented Aug 2, 2023 •

edited

Loading

kennsippell commented Aug 11, 2023

mrjones-plip commented Aug 29, 2023

mrjones-plip commented Aug 30, 2023

jkuester left a comment

kennsippell commented Sep 19, 2023

jkuester left a comment

jkuester Sep 19, 2023

medic-ci commented Sep 21, 2023

	"definition": "query_result(up{job=~\"cht\"})",
	"definition": "query_result(cht_version{app!~"^([0-3]\\.)\|(4\\.[0-2]\\.).*"})",

feat(#60): dashboards with CHT api express metrics #75

feat(#60): dashboards with CHT api express metrics #75

Conversation

kennsippell commented Jul 18, 2023 • edited Loading

Left Column

Right Column

kennsippell commented Jul 19, 2023 • edited Loading

jkuester commented Jul 20, 2023

jkuester left a comment

Choose a reason for hiding this comment

jkuester Jul 20, 2023

Choose a reason for hiding this comment

kennsippell commented Jul 21, 2023 • edited Loading

m5r commented Jul 24, 2023 • edited Loading

jkuester commented Jul 25, 2023

mrjones-plip commented Aug 2, 2023 • edited Loading

kennsippell commented Aug 11, 2023

mrjones-plip commented Aug 29, 2023

mrjones-plip commented Aug 30, 2023

jkuester left a comment

Choose a reason for hiding this comment

kennsippell commented Sep 19, 2023

jkuester left a comment

Choose a reason for hiding this comment

jkuester Sep 19, 2023

Choose a reason for hiding this comment

medic-ci commented Sep 21, 2023

kennsippell commented Jul 18, 2023 •

edited

Loading

kennsippell commented Jul 19, 2023 •

edited

Loading

kennsippell commented Jul 21, 2023 •

edited

Loading

m5r commented Jul 24, 2023 •

edited

Loading

mrjones-plip commented Aug 2, 2023 •

edited

Loading