-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timestamp field to Update message #121
base: master
Are you sure you want to change the base?
Conversation
I don't think that we should make this change. (//cc @gcsl @marcushines) We specifically made the choice when designing gNMI that I have a number of other observations:
Overall, I'm therefore not inclined to accept the change. I'd appreciate other's thoughts. |
I concur with @robshakir here. The single timestamp was a deliberate decision given the streaming implementation and the goal for low-latency updates. Data that is collected together can be sent together so long as the timestamp is accurate for that data. We specifically did not allow for data at different times to be sent in a single payload as it would mean there would be too much flexibility in building massive payloads that are bursty on both the server and client side of the connection. Within gNMI there should be no need to focus on building large payloads to make efficient use of MTU at the TCP layer as gRPC streaming abstracts that away. Data should be queued for sending as soon as it is ready, not buffered in the application space for packing. From the collection side of gNMI we break all notifications into single update payloads for indexing and distribution to clients as seen in the examples in github.com/openconfig/gnmi. There is an obvious on-the-wire overhead for a client that wants everything as it eliminates the savings of using a common prefix across multiple updates, but it enables efficient delivery of only the desired information to clients that request only a specific subset. The point being, we already actively use the worst case, one update per notification, at full production scale. I am sure a metric could be derived that would favor more packing but I think that metric would be contrary to the real-time goals of delivering data with low latency. |
I concur both rob and csl - the notification was built as the boundary between atomic things not updates. |
The motivating issue is that the underlying hardware is tagging counters with "this was good @ ts" but those counters are getting batched together in a notification (since they are getting polled in a batch). This is causing jitter in computed rates due to loss of granularity.
As was pointed out, one alternative approach that we have considered is disaggregating updates in those cases. Our concern was potential performance impact (payload size and latency). We can measure this and see the effect of disaggregation. |
@marcushines, @robshakir: This request has originated from myself and I would kindly request you reconsider this PR. As Ebben points out, the design of distributed systems is such that there is asynchronous polling at irregular intervals that update octets or packet statistics from, for example, ASIC registers to linecard CPU (to Routing Engine/Processor). Let's assume for the sake of argument that the ASIC is publishing an update to the linecard CPU DRAM every approximately 2 seconds for all of the interfaces that are in use on that ASIC, e.g.: 8 interfaces per ASIC. Let's also assume that the collector is receiving telemetry at a separate rate of approximately every 10 seconds, (however, this frequency doesn't really matter as the problem still occurs even at lower export rates, as well). What one can easily observe externally at the collector is that at non-uniform intervals you see smooth average bits/sec and/or packets/sec, then at a random point in time in ~1 - 2 minute window, you randomly see a "spike" in traffic rate, then it returns to a uniform average bits/sec or packets/sec. The reason that this occurs is, internally, within the distributed system you have some ASICs that will send a uniform, consistent 4 samples of statistics to the linecard CPU DRAM most of the time within that, say, 10 second window before the data is published externally. However, due to the nature of asynchronous polling you end up with random points in time where the ASIC updates the linecard CPU DRAM with 5 samples of octets, packet and error statistics for the interfaces on that ASIC within that 10 second window before its published externally to a collector. The router implementation is not computing a rate (bits/sec or packets/sec) and, instead, that is left to the external collector. However, the collector has no way of knowing what was the start/stop times for the period of time that the statistics were collected in the linecard CPU DRAM. Having that information in the packet exported to the collector for the corresponding statistic allows the collector the necessary information to calculate a proper, consistent average rate. IMO, this is critically important for determining at lower time periods/granularities what is the average rate of: bits/sec, packets/sec, errors/sec (of various forms), etc. that the ASIC is observing. While I do understand your points of disaggregating the counters to leverage a single timestamp in the Notification packet, that seems like a potentially large overhead when we also consider the potentially large number of interfaces, ASICs and corresponding counters in the system that will need to be disaggregated and exported. |
@samante This sounds like a misattribution of timestamps, not the need to send data with different timestamps in a single payload. If a line card is sending a payload every 2s, and the collector is collecting every 10 but for some reason you have 4 vs 5 samples (or maybe even 6) in the window should have no effect on the computed rate if the data that is sent is sent with the timestamp at which the last data was collected. This has the implication that you might not be sending a consistent window size 8, 10 or 12s, but the computed rate should be consistent. If you were attempting to have a consistent window also, given there would be some jitter in the collection, you would instead of triggering the output based on a 10s timer, trigger the output on a multiple of 5 samples. In either case, the timestamp reported to the collector should always be that of the raw data, not an artificially created multiple of the requested sample rate applied at serialization time. |
@gcsl Your point, above, is logical ... but, unfortunately, is not necessarily how the implementations in the field have been built, to my knowledge. I believe the question posed, here, is: do we request every router implement the behavior you suggest and would that even be feasible with existing router hardware + software or might it be more straightforward to just allow the router to export the timestamp, specifically as an optional attribute in the packet, corresponding to a statistic and, thus, allow the collector to compute the rate appropriately. |
@samante The current encoding allows for correct timestamps to be assigned to every given statistic. gNMI is a stream and thus we are talking about whether to send multiple notifications, one update each with the correct timestamp as the current spec is written, vs. the request here to pack multiple updates with different timestamps into the same notification. As stated above, the current encoding was an intentional design decision. We have seen early implementations try to dump the entire tree in a single notification, which is precisely the opposite of our goal for streaming telemetry as it introduces massive latency of the individual updates because the entire tree must be built as a proto in memory, then serialzed, the full proto transmitted and finally deserialized (also completely in memory) before the client can access a single metric. There is overhead in streaming, but the average latency can be reduced by up to half as a client can act on each notification as it arrives. If there are metrics that truly have the same timestamp, they can be bundled into the same notification for additional efficiency. |
Currently, the specification as defined attaches a
timestamp
field tothe
Notification
message and aNotification
message can contain oneor more
Update
messages. Thetimestamp
field is defined as thefollowing:
In various pipelines, many data sources can contribute to the data that
is packed in a single
Notification
message however thetimestamp
canonly represent the coalesced view. An example of this is data sources from a
distributed system (e.g. linecards) where hardware timestamps can represent the
true source prior to the aggregation and serialization towards the ultimate
TCP/gRPC session. The true "underlying source" timestamp is then lost and/or
only representative of the single packed
Notification
that aggretates theseUpdate
messages.This proposal is to add an optional
timestamp
field to theUpdate
message in order to have the ability to represent the true data source
timestamp. The
timestamp
field at theNotification
message is retained forbackwards compatibility, the ability for an implementation to choose to support
1 or both timestamps and in the event of supporting both, gives the ability to
determine potential issues within the system pipeline by calculating deltas
from the original data sources.
If the proposal is accepted, relevant gNMI specification documents will be
updated in a subsequent commit.