Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track RPC events and emit them to interested parties #660

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alexjg
Copy link
Contributor

@alexjg alexjg commented May 7, 2021

It is useful when debugging to be able to track network events as they
occur. This is tricky to do by just tailing the logs as it's quite
noisy. This PR adds a NetworkDiagnosticEvent enum which is emitted to a
channel in TinCans::diagnostic_events. Any time we receive or send an
RPC message we send it to this channel. This allows interested parties
to subscribe to the channel and display RPC events for diagnostic
purposes.

The naming here is all putative. NetworkDiagnosticEvent seems very verbose but was the best I could come up with.

@alexjg alexjg requested a review from a team as a code owner May 7, 2021 18:26
@alexjg alexjg force-pushed the alexjg/rpc-event-tracking branch from 3ccde30 to fdd987d Compare May 7, 2021 18:28
It is useful when debugging to be able to track network events as they
occur. This is tricky to do by just tailing the logs as it's quite
noisy. This PR adds a `NetworkDiagnosticEvent` enum which is emitted to a
channel in `TinCans::diagnostic_events`. Any time we receive or send an
RPC message we send it to this channel. This allows interested parties
to subscribe to the channel and display RPC events for diagnostic
purposes.
@alexjg alexjg force-pushed the alexjg/rpc-event-tracking branch from fdd987d to 6311711 Compare May 7, 2021 18:30
Copy link
Contributor

@kim kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must, unfortunately, express my opposition to this approach. A similar thing existed before, and was mistaken for a public API.

I don’t think IP addresses should be handed out, and care should be taken to avoid logging them. Also, the host language representation of wire types should be treated as accidental, and it shouldn’t be possible for the library consumer to scrutinise them.

If the problem at hand is noisiness of log output, we should look into making better use of the tracing library (which actually offers various knobs to control verbosity and output sink), or devise a replacement (see #517). I could, for example, imagine an approach where events of interest have a typed representation internally instead of using the macros directly, so as to ensure a consistent external representation. Deconstruction should, however, either not be possible, or we need to define and maintain this external representation as a stable API.

@alexjg
Copy link
Contributor Author

alexjg commented May 10, 2021

Interesting. I definitely agree that we should not be committing to the wire types as a public API.

Most of my motivation for this PR was to be able to provide a panel in the Upstream UI which displays protocol events as they occur. This has been helpful for me when debugging - even when I have the logs available. Frequently I am interested in logging about large parts of the system because I am trying to form hypotheses about what is going on, this means there's a lot of logging output and extracting just the log messages relating to protocol events requires additional grepping, this slows the debugging feedback loop down, a problem which is exacerbated when the problem is on someone else's machine. However, I think you're right that this could be achieved by using tracing. I can imagine a subscriber which outputs trace messages in a format which Upstream can understand, then upstream can provide ways of filtering for particular message attributes - maybe we add a tag to logging messages relating to protocol events for example. This way the only public API we're committing to is that we will produce logs in a tracing compatible format with a particular tag. Is this what you imagine?

I'm less sure on not logging IP addresses, this seems pretty critical for debugging. What's the reasoning behind not exposing them?

@FintanH
Copy link
Contributor

FintanH commented May 10, 2021

This conversation seems related to #589 and maybe we should try to flesh out a GRAND plan for telemetry and introspection :)

@xla
Copy link
Contributor

xla commented May 10, 2021

This conversation seems related to #589 and maybe we should try to flesh out a GRAND plan for telemetry and introspection :)

RFC much

@kim
Copy link
Contributor

kim commented May 10, 2021

What I’m saying is that tracing is somewhat, err, non-obvious to hold right, but is designed with all the service-mesh-grade ‘ilities in mind.

For example, there is a way to trim down log output via RUST_LOG in a very granular way. There is also the concept of Layers, which allow composition of collectors/formats/filters etc.

A severe limitation of the macro approach is obviously that it requires discipline to name events/fields in a consistent way. Thus, I wouldn’t be opposed to instead thread through a typed channel, and confine the mapping to a central place. This requires some thought to do efficiently in an eagerly evaluated language, and to do modularity-preserving in general. So, yes, a Grand Unified Theory of Observation wouldn’t hurt.

Re IP addresses: those are generally considered PII. The fact that our p2p layer is not privacy-preserving (yet) is not an excuse to leak them left and right. Iow, it should require conscious action for operators of public nodes to enable recording of such data, which implies data protection obligations for them, and never be a default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants