[docs] Recommended approach for metrics? #495
Replies: 3 comments 12 replies
-
Thank you for the suggestion @matt-winfield! I agree that this would be extremely helpful. I welcome any examples or other help folks can offer here. |
Beta Was this translation helpful? Give feedback.
-
Is a documentation website on the roadmap? The github is great and a lot of useful information is there. I think the experience could be made a lot better with an interactive site, code snippets & nested linking in general. Also searching is tough |
Beta Was this translation helpful? Give feedback.
-
I've got custom metrics working on my personal site using Prometheus + Grafana, I think this should be the recommended approach. (fly.io has built-in support for both Prometheus + Grafana, and they're also both free and self-hostable if someone would rather do that) Pros:
Cons:
You can check out the changes I made to add the metrics in this commit: matt-winfield/portfolio-remix@e2cf466. This allows tracking metrics for request count/rate/duration by route/status code/method, as well as SQL query counts + durations by the query (e.g. to see which queries are taking the longest) In summary:
Here's what my dashboard in Grafana looks like: The first few are provided in the default dashboard from Fly.io (stuff like data usage, sources etc.), I cloned this dashboard into an editable one to add the custom metrics. As an example, here's the query for the "Request Rate by Path" graph:
Grafana's query builder (the screenshot above) is very helpful for this stuff. I think this is a pretty good approach, do you think this is the one we should recommend? |
Beta Was this translation helpful? Give feedback.
-
We already have some good documentation on setting up server timings: https://github.com/epicweb-dev/epic-stack/blob/main/docs/server-timing.md But this only can give me information about requests that I make manually using the devtools. For production applications, we need some way to monitor if everything is working smoothly for end users, and identify potential bottlenecks.
Some info I think is essential to have:
It would also be useful to have:
Fly.io provides a Promethius instance + Grafana dashboard (in preview at the moment):
https://fly.io/docs/reference/metrics/#managed-grafana-preview
This includes some basic information that covers average HTTP response times + status codes, however it doesn't yet break it down by route (which would be useful to work out what's causing issues). There is a way to expose data to Prometheus via an endpoint, so perhaps that's how extra metrics could be added?
It would be great to add some documentation on 1) Grafana and 2) How we could add some metrics not included by default (namely HTTP respone time/status by route, frequency/duration of SQL queries and custom metrics).
Beta Was this translation helpful? Give feedback.
All reactions