Fix growing CPU usage caused by buffer server #573

tonynajjar · 2022-12-09T10:55:22Z

Setup

Buffer server running (ros2 run tf2_ros buffer_server)
Node with Action client calling tf2_buffer_server at a rate of 10 Hz

Observation

The CPU load grows steadily from 3% to 100%.

Issue

We noticed that the message published on /tf2_buffer_server/_action/status grows uncontrollably, which is causing the CPU increase (thanks @jplapp). The worst part is that every ActionClient to tf2_buffer_server subscribes to that status topic, so all of them grow in CPU usage!

Our Solution
Set action server result_timeout to 0: this keeps the status message small

Other Solutions

Reducing the default result_timeout: https://github.com/ros2/rcl/pull/1012/files. FYI @clalancette

Implementing a service instead of an action server:

This reverts commit edebd79.

clalancette

I don't think setting the result timeout to zero is the correct solution, or at least I don't believe we should do that in isolation.

That is, there is a reason that the default for actions is to keep the result around for some time, though I'll admit I'm still a little vague on the reasons for it. Until recently, that was set to 15 minutes, and during some testing I found exactly the same thing you did; it could eat up a lot of CPU time. In ros2/rcl#1012 , I reduced that timeout to 10 seconds, which helps a lot.

All of that said, I don't think we should override that result timeout here. If 10 seconds is still too large, then we should consider fixing it in rcl_action again, with reasoning as to why we should reduce it further.

tonynajjar · 2022-12-12T13:09:27Z

Thanks for your comment. I agree with you that this growing CPU thing is a general action server bug and it makes sense to fix it in rcl_action; ros2/rcl#1012 is already very beneficial. However, even with this fix, the size of the status message at steady-state will be 10 (status message reset rate) * action client call frequency (e.g 10 hz) = 100 messages per action client. All the action clients subscribe to this topic so that doesn't scale so well and this package is a reaction to that -> https://github.com/magazino/tf_service.

tonynajjar · 2022-12-12T13:13:25Z

So there are two different things:

1- action servers called at high rate with multiple action clients consume a lot of CPU
2- TF Buffer server/client has this issue because it fits the description of 1

I'm here to solve 2 and indirectly give feedback about 1. To solve 2, we could just consider implementing a service server (which I drafted here)

tfoote

Looking at this, the reasons for action_servers to keep the history around is designed for introspection. And it makes some sense that you can look at the status message for potentially the last N waypoints that the robot navigated to in the last 15 minutes. However in this case for tf queries that may happen at higher rates from multiple clients. Keeping and republishing that full history of queries is generally burdensome. As long as the results will get out I think that truncating this history for this use case to zero is a good approach. If there is a use case where this would be necessary/valuable we could parameterize it to make it optional. But in general most people won't want to incur this overhead when they don't even know that it's available.

Another option to consider for upstream would be to enable a queue depth in addition to the timeout to avoid the potential growth concerns just like we do with standard pub/sub buffer logic.

tonynajjar · 2023-05-15T19:52:06Z

@clalancette What do you think? If you're still not a fan of this approach, let's start iterating on #574.

tonynajjar · 2023-11-13T12:13:55Z

@clalancette should we close this PR in favor of #574?

set result_timeout to 0

2f8715d

tonynajjar requested review from ahcorde and clalancette as code owners December 9, 2022 10:55

tonynajjar mentioned this pull request Dec 9, 2022

Port to ROS2 magazino/tf_service#21

Open

Tony Najjar added 2 commits December 12, 2022 10:28

Add TF buffer service

edebd79

Revert "Add TF buffer service"

6530c38

This reverts commit edebd79.

clalancette requested changes Dec 12, 2022

View reviewed changes

Merge branch 'ros2:rolling' into rolling

f4c3b78

tfoote approved these changes May 15, 2023

View reviewed changes

tonynajjar requested a review from clalancette May 16, 2023 13:20

Merge branch 'ros2:rolling' into rolling

e5b91d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix growing CPU usage caused by buffer server #573

Fix growing CPU usage caused by buffer server #573

tonynajjar commented Dec 9, 2022

clalancette left a comment

tonynajjar commented Dec 12, 2022 •

edited

Loading

tonynajjar commented Dec 12, 2022 •

edited

Loading

tfoote left a comment

tonynajjar commented May 15, 2023

tonynajjar commented Nov 13, 2023

Fix growing CPU usage caused by buffer server #573

Are you sure you want to change the base?

Fix growing CPU usage caused by buffer server #573

Conversation

tonynajjar commented Dec 9, 2022

clalancette left a comment

Choose a reason for hiding this comment

tonynajjar commented Dec 12, 2022 • edited Loading

tonynajjar commented Dec 12, 2022 • edited Loading

tfoote left a comment

Choose a reason for hiding this comment

tonynajjar commented May 15, 2023

tonynajjar commented Nov 13, 2023

tonynajjar commented Dec 12, 2022 •

edited

Loading

tonynajjar commented Dec 12, 2022 •

edited

Loading