[docs] Recommended approach for metrics? #495

matt-winfield · 2023-10-22T21:07:34Z

matt-winfield
Oct 22, 2023

We already have some good documentation on setting up server timings: https://github.com/epicweb-dev/epic-stack/blob/main/docs/server-timing.md But this only can give me information about requests that I make manually using the devtools. For production applications, we need some way to monitor if everything is working smoothly for end users, and identify potential bottlenecks.

Some info I think is essential to have:

HTTP response time across all routes (mean, P90, P99)
Overall HTTP status codes across all routes
HTTP response time per route (mean, P90, P99)
HTTP status code per route

It would also be useful to have:

Show most frequent SQL queries
Show SQL queries with longest execution time
Ability to define custom metrics (e.g. recording the "server timing" data mentioned previously somewhere that can be turned into a dashboard)

Fly.io provides a Promethius instance + Grafana dashboard (in preview at the moment):
https://fly.io/docs/reference/metrics/#managed-grafana-preview

This includes some basic information that covers average HTTP response times + status codes, however it doesn't yet break it down by route (which would be useful to work out what's causing issues). There is a way to expose data to Prometheus via an endpoint, so perhaps that's how extra metrics could be added?

It would be great to add some documentation on 1) Grafana and 2) How we could add some metrics not included by default (namely HTTP respone time/status by route, frequency/duration of SQL queries and custom metrics).

kentcdodds · 2023-10-23T02:29:03Z

kentcdodds
Oct 23, 2023
Maintainer

Thank you for the suggestion @matt-winfield!

I agree that this would be extremely helpful. I welcome any examples or other help folks can offer here.

0 replies

swalker326 · 2023-10-23T17:46:30Z

swalker326
Oct 23, 2023

Is a documentation website on the roadmap? The github is great and a lot of useful information is there. I think the experience could be made a lot better with an interactive site, code snippets & nested linking in general. Also searching is tough

4 replies

kentcdodds Oct 23, 2023
Maintainer

Yes, I'll probably work on something in the next month or so

swalker326 Oct 23, 2023

Yes, I'll probably work on something in the next month or so

I'm down to help out, not sure how many cycles you've spent thinking about it. Available to help with that or just the grunt work of getting it implemented.

kentcdodds Oct 24, 2023
Maintainer

Thanks! I think I'd like to kick it off, but after that I could definitely use some help.

kentcdodds Oct 31, 2023
Maintainer

@swalker326, with a baby on the way I really don't know that I'm going to have time to start up the docs site if you're keen to start on something I'm happy to let you handle it. I've created a new repo and given you access to it as well as an issue outlining what I'm thinking: epicweb-dev/epic-stack-docs#1

Don't feel any pressure or obligation, but if you'd like to kick something off that could be cool :)

matt-winfield · 2023-10-26T18:00:12Z

matt-winfield
Oct 26, 2023
Author

I've got custom metrics working on my personal site using Prometheus + Grafana, I think this should be the recommended approach. (fly.io has built-in support for both Prometheus + Grafana, and they're also both free and self-hostable if someone would rather do that)

Pros:

Code changes are quite simple to add
Works "out of the box" with Fly.io
Very flexible to add custom metrics, custom labels, custom dashboards etc.
Prometheus and Grafana are industry standard, widely used tools for this problem

Cons:

Grafana + PromQL (the Prometheus query language) have a bit of a learning curve. Setting up the basic dashboards in Grafana takes a bit of learning if you aren't already familiar with these tools. (You won't get any graphs for your custom metrics by default, only the ones Fly.io provides)

You can check out the changes I made to add the metrics in this commit: matt-winfield/portfolio-remix@e2cf466. This allows tracking metrics for request count/rate/duration by route/status code/method, as well as SQL query counts + durations by the query (e.g. to see which queries are taking the longest)

In summary:

Register Counters, Histograms, and Summaries for any custom metrics (in this case request and SQL query counts + durations). I'm using https://github.com/siimon/prom-client for this
Collect those metrics
- In an express middleware for the request metrics
- In the Prisma client.$on() for query metrics
Expose the metrics from the registry on a new endpoint /metrics (listening on a different port to the main app, this doesn't need to be exposed to the internet)
Configure fly.toml with info about custom metrics endpoint
Deploy + Add graphs in Grafana (https://fly.io/docs/reference/metrics/#managed-grafana-preview)

Here's what my dashboard in Grafana looks like:

The first few are provided in the default dashboard from Fly.io (stuff like data usage, sources etc.), I cloned this dashboard into an editable one to add the custom metrics.

As an example, here's the query for the "Request Rate by Path" graph:

rate(http_request_count{app="portfolio-remix-5452"}[$__rate_interval])

Grafana's query builder (the screenshot above) is very helpful for this stuff.

I think this is a pretty good approach, do you think this is the one we should recommend?

8 replies

matt-winfield Oct 27, 2023
Author

I'd really like to pick this up, if thats okay. I wanted to wait for your feedback before I started work on it though. @matt-winfield Would it be okay to reach out with any questions I run into?

Sure, happy to answer any questions 🙂

swalker326 Oct 27, 2023

Do you have a local testing/development story or do you only see it deployed?

swalker326 Oct 27, 2023

Actually I think I got it, if you wanted to take a look. Still writing the docs update.

#503

matt-winfield Oct 27, 2023
Author

Do you have a local testing/development story or do you only see it deployed?

I didn't set up Prometheus/Grafana locally, but you can verify it's collecting metrics correctly by checking the localhost:9091/metrics endpoint, you should see data for all of the metrics there.

In theory you can run your own Prometheus + Grafana instances on your machine, but they're a bit of a faff to set up, it's probably not worth the effort!

kentcdodds Oct 27, 2023
Maintainer

Yeah, I'm not so sure I care to go through the trouble of setting things up locally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Recommended approach for metrics? #495

{{title}}

Replies: 3 comments 12 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[docs] Recommended approach for metrics? #495

matt-winfield Oct 22, 2023

Replies: 3 comments · 12 replies

kentcdodds Oct 23, 2023 Maintainer

swalker326 Oct 23, 2023

kentcdodds Oct 23, 2023 Maintainer

swalker326 Oct 23, 2023

kentcdodds Oct 24, 2023 Maintainer

kentcdodds Oct 31, 2023 Maintainer

matt-winfield Oct 26, 2023 Author

matt-winfield Oct 27, 2023 Author

swalker326 Oct 27, 2023

swalker326 Oct 27, 2023

matt-winfield Oct 27, 2023 Author

kentcdodds Oct 27, 2023 Maintainer

matt-winfield
Oct 22, 2023

Replies: 3 comments 12 replies

kentcdodds
Oct 23, 2023
Maintainer

swalker326
Oct 23, 2023

kentcdodds Oct 23, 2023
Maintainer

kentcdodds Oct 24, 2023
Maintainer

kentcdodds Oct 31, 2023
Maintainer

matt-winfield
Oct 26, 2023
Author

matt-winfield Oct 27, 2023
Author

matt-winfield Oct 27, 2023
Author

kentcdodds Oct 27, 2023
Maintainer