SBN changes
#321
Replies: 1 comment 3 replies
-
@mbenson1 We've observed this behavior too, and it would great to address this limitation. Maybe instead of buffering the open-loop async messages, the SBN peer interaction could be synchronous so that one node can get acknowledgement of whether a new peer does or does not subscribe to a particular message ID. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I would like to suggest some changes to SBN. We tried using it to connect two instances of CFS running on a multicore processor in AMP mode, but ran into architectural problems. In the end, we had to abandon SBN and implement our own solution. The most important change I would like to see is to allow the lower layers to have flow control over the upper layer. The problem is that our lower layer providing the actual exchange of data with the other instances has a limited throughput. The upper layer pushes data too fast and overflows our data pipe. Given the purely pub/sub architecture is inherently unpredictable, overflowing the pipe is unavoidable and acceptable for the actual SB messages. Our data pipe is stream based, not data gram, but we implemented a mechanism to ensure the serialized pipe detects byte-by-byte dropout, re-synchronizes, and immediately recovers. We tuned the pipe to provide adequate throughput during nominal operations, with the occasional overflow during asynchronous bursts, i.e. a surge of event messages. Our applications are designed to tolerate a small amount of dropout, so this isn't a problem. The problem is in the metadata that SBN instances exchange with themselves. The subscription messages are sent as large bursts. Either 1 or 2 large bursts of all subscriptions, followed by ad-hoc single subscription messages. These bursts overwhelm our pipe causing only some or none of them to come through. The result is instance A has some of the subscriptions from instance B, and instance B has no subscriptions from instance A. The size of that initial burst is too much for us to over allocate the pipe to accommodate, especially since the burst is unconstrained anyway. Unlike the SB messages, the subscription messages are critical for nominal operation and SBN is not resilient to dropout. Maybe this is addressed somehow and I just missed it.
One possible solution is for the upper layer to queue outgoing messages in an over allocated queue, and let the lower layer pop the messages off as needed. We already use this technique for our Telemetry Output application with the platform specific layer running as a separate task so it can just pop the messages off the queue as fast as the media allows it and as fast as the priority allows it to execute. A more complicated solution would be for the lower layer to drive flow control of the upper layer, either directly or with semaphores to allow the lower layer to run as a separate thread.
Suggestion #2... add quality of service. At a minimum, just add a mechanism that ensures the subscription messages are guaranteed delivery. It would be nice to have a lower quality of service for asynchronous bursty but non-critical messages like CFE events. So a single event generating multiple event messages cannot saturate the pipe and preempt more critical messages.
Beta Was this translation helpful? Give feedback.
All reactions