Track RPC events and emit them to interested parties #660

alexjg · 2021-05-07T18:26:23Z

It is useful when debugging to be able to track network events as they
occur. This is tricky to do by just tailing the logs as it's quite
noisy. This PR adds a NetworkDiagnosticEvent enum which is emitted to a
channel in TinCans::diagnostic_events. Any time we receive or send an
RPC message we send it to this channel. This allows interested parties
to subscribe to the channel and display RPC events for diagnostic
purposes.

The naming here is all putative. NetworkDiagnosticEvent seems very verbose but was the best I could come up with.

It is useful when debugging to be able to track network events as they occur. This is tricky to do by just tailing the logs as it's quite noisy. This PR adds a `NetworkDiagnosticEvent` enum which is emitted to a channel in `TinCans::diagnostic_events`. Any time we receive or send an RPC message we send it to this channel. This allows interested parties to subscribe to the channel and display RPC events for diagnostic purposes.

kim

I must, unfortunately, express my opposition to this approach. A similar thing existed before, and was mistaken for a public API.

I don’t think IP addresses should be handed out, and care should be taken to avoid logging them. Also, the host language representation of wire types should be treated as accidental, and it shouldn’t be possible for the library consumer to scrutinise them.

If the problem at hand is noisiness of log output, we should look into making better use of the tracing library (which actually offers various knobs to control verbosity and output sink), or devise a replacement (see #517). I could, for example, imagine an approach where events of interest have a typed representation internally instead of using the macros directly, so as to ensure a consistent external representation. Deconstruction should, however, either not be possible, or we need to define and maintain this external representation as a stable API.

alexjg · 2021-05-10T10:14:55Z

Interesting. I definitely agree that we should not be committing to the wire types as a public API.

Most of my motivation for this PR was to be able to provide a panel in the Upstream UI which displays protocol events as they occur. This has been helpful for me when debugging - even when I have the logs available. Frequently I am interested in logging about large parts of the system because I am trying to form hypotheses about what is going on, this means there's a lot of logging output and extracting just the log messages relating to protocol events requires additional grepping, this slows the debugging feedback loop down, a problem which is exacerbated when the problem is on someone else's machine. However, I think you're right that this could be achieved by using tracing. I can imagine a subscriber which outputs trace messages in a format which Upstream can understand, then upstream can provide ways of filtering for particular message attributes - maybe we add a tag to logging messages relating to protocol events for example. This way the only public API we're committing to is that we will produce logs in a tracing compatible format with a particular tag. Is this what you imagine?

I'm less sure on not logging IP addresses, this seems pretty critical for debugging. What's the reasoning behind not exposing them?

FintanH · 2021-05-10T12:02:04Z

This conversation seems related to #589 and maybe we should try to flesh out a GRAND plan for telemetry and introspection :)

xla · 2021-05-10T12:10:03Z

This conversation seems related to #589 and maybe we should try to flesh out a GRAND plan for telemetry and introspection :)

RFC much

kim · 2021-05-10T13:12:49Z

What I’m saying is that tracing is somewhat, err, non-obvious to hold right, but is designed with all the service-mesh-grade ‘ilities in mind.

For example, there is a way to trim down log output via RUST_LOG in a very granular way. There is also the concept of Layers, which allow composition of collectors/formats/filters etc.

A severe limitation of the macro approach is obviously that it requires discipline to name events/fields in a consistent way. Thus, I wouldn’t be opposed to instead thread through a typed channel, and confine the mapping to a central place. This requires some thought to do efficiently in an eagerly evaluated language, and to do modularity-preserving in general. So, yes, a Grand Unified Theory of Observation wouldn’t hurt.

Re IP addresses: those are generally considered PII. The fact that our p2p layer is not privacy-preserving (yet) is not an excuse to leak them left and right. Iow, it should require conscious action for operators of public nodes to enable recording of such data, which implies data protection obligations for them, and never be a default.

alexjg requested a review from a team as a code owner May 7, 2021 18:26

alexjg force-pushed the alexjg/rpc-event-tracking branch from 3ccde30 to fdd987d Compare May 7, 2021 18:28

alexjg force-pushed the alexjg/rpc-event-tracking branch from fdd987d to 6311711 Compare May 7, 2021 18:30

kim suggested changes May 8, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track RPC events and emit them to interested parties #660

Track RPC events and emit them to interested parties #660

alexjg commented May 7, 2021

kim left a comment

alexjg commented May 10, 2021

FintanH commented May 10, 2021

xla commented May 10, 2021

kim commented May 10, 2021 •

edited

Loading

Track RPC events and emit them to interested parties #660

Are you sure you want to change the base?

Track RPC events and emit them to interested parties #660

Conversation

alexjg commented May 7, 2021

kim left a comment

Choose a reason for hiding this comment

alexjg commented May 10, 2021

FintanH commented May 10, 2021

xla commented May 10, 2021

kim commented May 10, 2021 • edited Loading

kim commented May 10, 2021 •

edited

Loading