-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing Metrics #1356
Comments
Slight preference for Prometheus for the sake of simplicity. I have not had the need of tracing (enabled e.g. through OpenCensus) in a peer-to-peer application. |
The latest is "opentelemetry" (a merger of opencensus and opentracing". I'd prefer that because, as you say, it's backend agnostic and is just a facade unless used. In go-ipfs, we used to rely heavily on https://github.com/ipfs/go-metrics-interface/ to avoid depending on a specific metrics implementation, but opentelemetry should make that point moot. |
Are we already using that somewhere in our stack? |
This doesn't look very reassuring at this point: https://opentelemetry.io/docs/instrumentation/go/manual/#creating-metrics |
I very much dislike the idea of going with the flavour of the day; applications should be free to use whatever they want, and we already have two different systems in prod (lotus used opencensus, ipfs uses raw prometheus). Can't we just define a very thin interface and use that? |
I haven't personally made much use of it, but it shows up in some consumers of go-libp2p. A few I'm aware of:
I don't have a strong preference here, but it seems like the options are:
If people are interested in 4, but not in using go-metrics-interface I'd like to understand why and how that same argument is unlikely to be reused in 6-12 months to argue why our new interface isn't good enough and needs an alternative. I happen to be inclined to use some standard interface so we don't have to make all of our own wrappers + interfaces here and we can let other people mess around with creating "the perfect logging interface™️" while we work on p2p networking 😄. If people want to roll our own here then that's fine by me too. |
Thank you for the links! These are all tracer integrations though, not metrics. I'd be happy to use OpenTelemetry for metrics as well, and it looks like their metrics support is currently in the alpha phase. I'm a bit wary of rolling our own interface, mostly for https://xkcd.com/927/ reasons, as @aschmahmann pointed out. |
Unfortunately there isn't a good answer to OpenTelemetry vs. OpenCensus. OpenCensus is on a deprecation path, but the Go implementation of the OpenTelemetry Metrics API is not yet stable, so pick your poison. I don't think creating our own wrapper would be a good idea. OpenTelemetry is already a wrapper and enables writing bridges to unknown future implementations. And those are likely to be written by the experts instead of us. There is an OpenCensus -> OpenTelemetry metrics bridge, but it has some limitations around the metric types that are supported due to complexities in translating. If the experts can't write a complete bridge, then I don't think we should burden ourselves with that too. Note that the the Metrics API Spec for OpenTelemetry is marked as "stable". It is the Go implementation that is not yet stable. Here is the milestone for the stable release of the Metrics API for It's pretty clear that industry resources are now devoted to OpenTelemetry, and a stable Go Metrics API is imminent, so I think we should hop on that bandwagon. |
Good discussion here - thanks folks for contributing. One thing that will be useful to get defined is the metrics we're going to measure. Things that I think are important:
In terms of where this kind of info lives, I leave it up to others but maybe this warrants a spec or a docs page? Good stuff! |
I would like to propose something different. |
As the person who introduced OpenCensus, some background thoughts that I had at the time.
Personally, I really don't like where OpenTelemetry took metrics, last I looked, OTel did away with a lot of the nice things about OC metrics. Personally, from what I have followed of OTel, they didn't really care about the metrics side of things and still don't. My guess was ego and NIH syndrome are involved. On top of that the spec is very java-esk and that has resulted in less than ideal situations in the go implementation. Currently, still in the position of consuming metrics from libp2p applications and libraries, the things I care about are consistent prefixes, i.e.
@vyzo Please don't. It just means another package needs to be written and then the ability to discover that package will be reduced. i.e. Estuary just implemented the metrics for the libp2p resource manager, but they are now specific to the estuary codebase and anyone else wanting to have resource manager metrics has to duplicate the exact same code. There is zero user benefit to that course of action. |
@lanzafame i didnt suggest that the application implements the interface, the library would provide the default implementation. I just want to avoid polluting the code with metrics invocations, which would be very ugly and tied to the particular choice. Our code should be agnostic of such things. |
OpenTelemetry Metrics support has shipped: https://opentelemetry.uptrace.dev/guide/go-metrics.html |
@marten-seemann it still hasn't officially shipped, metrics are still listed as Alpha: https://github.com/open-telemetry/opentelemetry-go. |
FWIW js-libp2p has a metrics capability similar to what's being discussed here. It collects various named metrics as simple key/value pairs that are updated over time but not retained. There's no dependency on Prometheus or any other monitoring system which keeps it lightweight and flexible. When there's a libp2p app that runs as a daemon, it makes the metrics available in some way. For example js-ipfs has a HTTP API that sets up an endpoint that Prometheus calls periodically to record the libp2p metrics, plus additional metrics about the environment - memory usage, CPU etc. An app could write these stats out to a file instead and skip Prometheus integration entirely, whatever works for the app author. |
Adding another consideration to the list: performance. As it turns out, OpenCensus generates tons of garbage: #1955. |
once more i want to say i told ya so... |
Master Fu you should listen to sometimes... |
@vyzo This is not really helpful, even if you did say so. Which in this case you didn't. You suggested to provide a default implementation, without specifying which library to use. Chances are we would have used OpenCensus for that. Please don't take this the wrong way. I'm still happy about constructive comments that help us to come to a decision here. |
To finish this project, we should have a libp2p folder in the PL Grafana instance. That would allow us to pull up the libp2p metrics for every PL-run libp2p node. In addition, @sukunrt suggested to make metrics of one libp2p node (probably on the IPFS network) available publicly, possibly linked from libp2p.io. @dhuseby, wdyt? |
That sounds like a good idea, there should be an accompanying blog post as well. |
libp2p currently collects a few metrics (for example, for TCP and QUIC). We also collect metrics for the resource manager, however, that is currently left as a responsibility to the application. This is how we end up with slightly different implementations (and metrics names) in go-ipfs and lotus.
In the (very near!) future, we want to expose more metrics from the swarm regarding transports, security protocols and muxers. This would have allowed us to detect the muxer prioritisation bug (ipfs/kubo#8750) a lot earlier.
We therefore should have a coherent libp2p metrics story.
Open questions:
Thoughts, @vyzo @aschmahmann @mxinden @Stebalien @lanzafame?
Tracking the various components we want to instrument:
Supporting work (need to close this issue):
defer:
The text was updated successfully, but these errors were encountered: