SBN changes #321

mbenson1 · 2021-07-29T14:16:06Z

mbenson1
Jul 29, 2021

I would like to suggest some changes to SBN. We tried using it to connect two instances of CFS running on a multicore processor in AMP mode, but ran into architectural problems. In the end, we had to abandon SBN and implement our own solution. The most important change I would like to see is to allow the lower layers to have flow control over the upper layer. The problem is that our lower layer providing the actual exchange of data with the other instances has a limited throughput. The upper layer pushes data too fast and overflows our data pipe. Given the purely pub/sub architecture is inherently unpredictable, overflowing the pipe is unavoidable and acceptable for the actual SB messages. Our data pipe is stream based, not data gram, but we implemented a mechanism to ensure the serialized pipe detects byte-by-byte dropout, re-synchronizes, and immediately recovers. We tuned the pipe to provide adequate throughput during nominal operations, with the occasional overflow during asynchronous bursts, i.e. a surge of event messages. Our applications are designed to tolerate a small amount of dropout, so this isn't a problem. The problem is in the metadata that SBN instances exchange with themselves. The subscription messages are sent as large bursts. Either 1 or 2 large bursts of all subscriptions, followed by ad-hoc single subscription messages. These bursts overwhelm our pipe causing only some or none of them to come through. The result is instance A has some of the subscriptions from instance B, and instance B has no subscriptions from instance A. The size of that initial burst is too much for us to over allocate the pipe to accommodate, especially since the burst is unconstrained anyway. Unlike the SB messages, the subscription messages are critical for nominal operation and SBN is not resilient to dropout. Maybe this is addressed somehow and I just missed it.

One possible solution is for the upper layer to queue outgoing messages in an over allocated queue, and let the lower layer pop the messages off as needed. We already use this technique for our Telemetry Output application with the platform specific layer running as a separate task so it can just pop the messages off the queue as fast as the media allows it and as fast as the priority allows it to execute. A more complicated solution would be for the lower layer to drive flow control of the upper layer, either directly or with semaphores to allow the lower layer to run as a separate thread.

Suggestion #2... add quality of service. At a minimum, just add a mechanism that ensures the subscription messages are guaranteed delivery. It would be nice to have a lower quality of service for asynchronous bursty but non-critical messages like CFE events. So a single event generating multiple event messages cannot saturate the pipe and preempt more critical messages.

jbohren-hbr · 2021-08-03T00:56:32Z

jbohren-hbr
Aug 3, 2021

@mbenson1 We've observed this behavior too, and it would great to address this limitation. Maybe instead of buffering the open-loop async messages, the SBN peer interaction could be synchronous so that one node can get acknowledgement of whether a new peer does or does not subscribe to a particular message ID.

3 replies

mbenson1 Aug 3, 2021
Author

I had to reread your response and edit my response. Are you proposing bridging all messages whether or not its subscribed to? Our solution was sort of similar. We just wrote a new application that bridged all messages defined in a table, whether its subscribed to our not. No problematic Byzantine side effects. We added commands to add and remove message IDs so it can be modified at runtime. Its not automatic, but we're ok with that, particularly since it results in a totally deterministic transmission profile.

jbohren-hbr Aug 3, 2021

Are you proposing bridging all messages whether or not its subscribed to?

Not quite, I was suggesting that SBN be modified to expect an acknowledgement (subscribed / not subscribed) associated with each SBN subscription message, such that it can re-send the subscription message if a peer doesn't respond.

We just wrote a new application that bridged all messages defined in a table, whether its subscribed to our not.

Yeah that's a nice paradigm to have. It would probably be good to add configuration parameters to SBN to explicitly allow or disallow ad-hoc connections from being made.

jbohren-hbr Aug 3, 2021

cc @CDKnightNASA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SBN changes #321

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

SBN changes #321

mbenson1 Jul 29, 2021

Replies: 1 comment · 3 replies

jbohren-hbr Aug 3, 2021

mbenson1 Aug 3, 2021 Author

jbohren-hbr Aug 3, 2021

jbohren-hbr Aug 3, 2021

mbenson1
Jul 29, 2021

Replies: 1 comment 3 replies

jbohren-hbr
Aug 3, 2021

mbenson1 Aug 3, 2021
Author