The app_start label of octoprint_infos_info should be a separate metric #22

hairyhenderson · 2021-01-30T16:13:54Z

Hi, and first off - thanks for a useful plugin!

The octoprint_infos_info metric has a label app_start, which seems to be the time (in seconds since epoch) that the plugin started. That's handy but I'd like to be able to use it as a value.

Can I suggest a new metric: octoprint_app_start_seconds?

If that sounds like a good idea, I could issue a PR.

The text was updated successfully, but these errors were encountered:

tg44 · 2021-02-01T10:32:49Z

I think it should be a counter (uptime). And increase with x in every x ms. (As I remember cadvisor node_exporter and the others doing this.) It's an interesting question that "when" should it grow, but probably a 10s timer is good enough. (Also prometheus can "continue" counters if you write the queries well, so sum uptime and stuff like that can be extracted later.)

PRs always welcome!

hairyhenderson · 2021-02-01T13:51:05Z

The convention is that timestamp-type metrics are a gauge - less effort that way, and more accurate. If you want to find out "how long has this been running?" you can do a query like time() - octoprint_app_start_seconds.

Tracking uptime is usually done with the synthetic up metric, which is generated by Prometheus when it scrapes successfully (i.e. if the process can be scraped, it's probably "up").

tg44 · 2021-02-01T16:30:09Z

I checked what you said. I found out that almost all of my currently running exporters expose the process_start_time_seconds as a gauge (why gauge? why not a counter? we cant travel back in time...). This means If we expose it, we should name it like this(?). I can't found any source that says exactly how we should expose uptime/start time (but I'm interested if you can link any post about this. I also reread the prom. documentation about writing exporters without much success). So we could expose start time as process_start_time_seconds?

Also, the python lib has build in process exporter functionality which is also strange bcs the documentation discouraging to do this. So I'm a bit puzzled right now :D

Somewhat irrelevant but; I think up is a bad metric to calculate uptime from. There could be a lot of reasons why you can't scrape the endpoint (network error, or downtime on the scraper side), which has nothing to do with the actual application uptime.

hairyhenderson · 2021-02-01T20:31:10Z

@tg44 ah - if app_start is the process start time, then indeed process_start_time_seconds makes more sense. I wasn't totally sure what it was 😉

why gauge? why not a counter? we cant travel back in time...

Counters are intended to be monotonically-increasing, which time generally isn't (think leap seconds, etc). Also counters imply that you can do rate calculations on them, and when the value goes down it's considered a reset. That doesn't make sense with a timestamp - a timestamp may decrease following an NTP time synchronization.

This leaves a Gauge as the best option, and especially when you consider that a value like process_start_time_seconds is static (the start time should never change during the life of a process), a counter makes no sense since there's no situation where the value should be increasing.

Also, the python lib has build in process exporter functionality which is also strange bcs the documentation discouraging to do this. So I'm a bit puzzled right now :D

Simply enabling the process collector is probably the best option - I thought it was enabled by default but I'm not seeing its metrics from octoprint.

As for that document discouraging the practice... IMO it's not super-well-worded - I think the intent is more to ensure that people don't duplicate node_exporter's functionality in-process. Since node_exporter doesn't report per-process cpu/mem/etc usage, there's no conflict here.

tg44 · 2021-02-02T08:51:56Z

Then I think we should just enable the process collector. If we enable it, it will automatically add the process_start_time_seconds too. And we are good to go.

(think leap seconds, etc)

I really hate our time system...

Since node_exporter doesn't report per-process cpu/mem/etc usage, there's no conflict here.

Node exporter not, but cadvisor, for example, doing this. I think the main intention is to not duplicate data, and every collector should collect the right scope.

Thank you! I learned a lot from your comments!

hairyhenderson · 2021-02-02T12:25:39Z

Node exporter not, but cadvisor, for example, doing this.

cAdvisor only works if you're running in a container, and even so, cAdvisor tracks in-container usage, not the usage of individual processes inside the container (of which there can be many).

Ok, sounds like enabling the process collector is the thing to do. It's enabled by default on the default registry, but this exporter is using a custom registry - is there a reason for that? Seems to me that simply removing _registry (and references to it) will do the trick.

hairyhenderson · 2021-02-02T12:36:02Z

I've issued #23 for this, though feel free to take a different approach if you really want a custom registry!

tg44 · 2021-02-02T13:01:15Z

We started to use the _registry to resolve #13 I'm not sure this is the wise approach tho.

hairyhenderson · 2021-02-02T14:38:15Z

Ah... I see, that sort-of makes sense 😉 - I'll rework #23 to account for that.

hairyhenderson mentioned this issue Feb 2, 2021

Use default registry to get automatic ProcessCollector #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The app_start label of octoprint_infos_info should be a separate metric #22

The app_start label of octoprint_infos_info should be a separate metric #22

hairyhenderson commented Jan 30, 2021

tg44 commented Feb 1, 2021

hairyhenderson commented Feb 1, 2021

tg44 commented Feb 1, 2021

hairyhenderson commented Feb 1, 2021

tg44 commented Feb 2, 2021

hairyhenderson commented Feb 2, 2021

hairyhenderson commented Feb 2, 2021

tg44 commented Feb 2, 2021

hairyhenderson commented Feb 2, 2021

The app_start label of octoprint_infos_info should be a separate metric #22

The app_start label of octoprint_infos_info should be a separate metric #22

Comments

hairyhenderson commented Jan 30, 2021

tg44 commented Feb 1, 2021

hairyhenderson commented Feb 1, 2021

tg44 commented Feb 1, 2021

hairyhenderson commented Feb 1, 2021

tg44 commented Feb 2, 2021

hairyhenderson commented Feb 2, 2021

hairyhenderson commented Feb 2, 2021

tg44 commented Feb 2, 2021

hairyhenderson commented Feb 2, 2021