Performance benchmarking the k4a publiser #283

adityapande-1995 · 2023-06-12T22:29:02Z

Summary

This issue to serve as a discussion thread for investigating slowdowns and other benchmarking performance of ros2 when tested using the k4a camera publisher.
I've created a repo here : performance-testing which uses a fake k4a camera publisher, and a subscriber that reads the messages and measures the throughput. Detailed instructions are in the README file . It also shows a topic wise breakdown of the bandwidth used. By default the publisher runs at 30 Hz.

To summarize, I was reading data at 287 MB/s from all the topics, and a percentage wise split of that is as follows :

/k4a/points (PointClouds) 59.58 %
/k4a/color/Image (Image) 33.62 %
/k4a/depth/Image (Image) 6.77 %
Rest : 0.03 %

Mininet can only simulate connections upto 1000 Mbit/s, so it is not possible to simulate the 25 Gbit/s switch using that.

Aim

The ultimate aim here to make the messages reach smoother on the other end.

Solutions

This depends on the ultimate use case, but an easy solution here is to add intermediate nodes to compress image or pointcloud data and use those topic instead of the raw ones. Depending on the codec here, we could have a lossy or lossless compression, and introduce a bit of latency in the messages.

Possible tradeoffs and constraints

What are the parameters here that we're willing to compromise on ? Can we add more compute, latency, or change QoS policies ? Adding an intermediate compression node might add some latency and need extra compute, but will make the messages much smaller in size.

The poincloud messages are the largest, so we could try setting the Reliability on that to "best effort" or changing durability to "volatile". This also depends on how lossy the network is, and do we care about time deadlines for message delivery.

Things to investigate

It might be worth tracking which message arrives when in time (compared to its timestamp). That is, it might be worth looking into compression if time to compress + sending to subscriber is less than the time it takes the raw message to reach the subscriber.

I'm open to ideas so feel free to brainstorm on this thread cc @IanTheEngineer @calderpg-tri @sloretz

adityapande-1995 · 2023-06-15T20:21:13Z

Attempting to push the limits of loopback (default QoS, rclpy subscriber and publisher):

Publishing freq : 30 Hz, measured samples : 100 msgs : Cumulative throughput -> 270 MB/s
Publishing freq : 60 Hz, measured samples : 500 msgs : Cumulative throughput -> 507 MB/s
Publishing freq : 100 Hz, measured samples : 1000 msgs : Cumulative throughput -> 700 MB/s
Publishing freq : 120 Hz, measured samples : 1000 msgs : Cumulative throughput -> 780 MB/s

So we're definitely not hitting the limits of the loopback network at 30 Hz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance benchmarking the k4a publiser #283

Performance benchmarking the k4a publiser #283

adityapande-1995 commented Jun 12, 2023 •

edited

Loading

adityapande-1995 commented Jun 15, 2023 •

edited

Loading

Performance benchmarking the k4a publiser #283

Performance benchmarking the k4a publiser #283

Comments

adityapande-1995 commented Jun 12, 2023 • edited Loading

Summary

Aim

Solutions

Possible tradeoffs and constraints

Things to investigate

adityapande-1995 commented Jun 15, 2023 • edited Loading

adityapande-1995 commented Jun 12, 2023 •

edited

Loading

adityapande-1995 commented Jun 15, 2023 •

edited

Loading