-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timed out while waiting for action server to acknowledge goal request for compute_path_through_poses (or follow_path) #4278
Comments
I don't have a good answer for you about this, I have not seen this and this isn't a known cause for any of the flakiness in our system testing (which we call these servers regularly from) I'm curious if its related to your BT tick rate, you're running it at 20 hz so maybe its too slow or you need to increase the timeout? Have you tried this with other DDS vendors like Cyclone and still see it occurring? |
@SteveMacenski thanks for your answer. I forgot to mention that we use a discovery server. I tried to switch to CycloneDDS, but encountered the following error : ros2/rmw_cyclonedds#458 . The proposed solution does not work for me at the moment. I'll keep investigating on monday. |
Hi, I finally managed to fix the Our robot has been running with CycloneDDS for the past 2 days, and the action acknowledgment timeout error has not occurred once. This is very encouraging since it used to occur a few times per day with Fast-DDS. @SteveMacenski Regarding your remark about the BT tick rate, why would it be too slow ? I don't see a direct link with the observed timeout. Also, the chosen value of 100ms for the timeout seems already high to me. A normal action acknowledgement is usually received within a few ms. I don't think that increasing it will help. |
Interesting. You may want to file a ticket with RMW Fast DDS (https://github.com/ros2/rmw_fastrtps) that some current behavior has caused you an issue and link back here. I'm sure they're going to ask you to come up with some more minimal test they can reproduce so they can dig into it when they have the cycles. I didn't know if one, other, both, or neither was the cause of the issue, I was giving you some options of things to test to see what made a behavioral change that I could dig into :-) It seems that the DDS or RMW is a likely cause so its worth making folks over there aware of this in case it strikes a chord. You can safely ignore the tick rate stuff! I think that largely closes this ticket out for Nav2? Not much we can do on the RMW front unfortunately over here. |
We encountered a similar issue, which may be related to the real-time capability or configuration of DDS. The solution is still under investigation. Increasing the timeout (e.g., to 5 seconds) may not solve the issue, as it could still occur, and a 5-second delay may be too long for certain actions (e.g., the controller). rclcpp/rclcpp_action/src/server.cpp
void
ServerBase::execute_goal_request_received()
{
...
auto response_pair = call_handle_goal_callback(uuid, message);
{
std::lock_guard<std::recursive_mutex> lock(pimpl_->action_server_reentrant_mutex_);
/**********************
If the client receives the acknowledgment after a timeout, the client will assume the request has failed.
However, the server will still proceed with executing call_goal_accepted_callback, which initiates the worker thread to process the goal.
**********************/
ret = rcl_action_send_goal_response(
pimpl_->action_server_.get(),
&request_header,
response_pair.second.get());
}
...
if (GoalResponse::ACCEPT_AND_EXECUTE == status || GoalResponse::ACCEPT_AND_DEFER == status) {
...
call_goal_accepted_callback(handle, uuid, message);
}
} If we look at the service, it runs a callback that may trigger a thread before sending a response. However, it may encounter a similar issue. rclcpp/rclcpp/include/rclcpp/service.hpp
void
handle_request(
std::shared_ptr<rmw_request_id_t> request_header,
std::shared_ptr<void> request) override
{
auto typed_request = std::static_pointer_cast<typename ServiceT::Request>(request);
auto response = any_callback_.dispatch(this->shared_from_this(), request_header, typed_request);
if (response) {
send_response(*request_header, *response);
}
} |
Bug report
Required Info:
Steps to reproduce issue
The issue occurs randomly when the
bt_navigator
sends a goal to thecompute_path_through_poses
action provided by theplanner_server
node. The BT node used to send the request isComputePathThroughPosesAction
. In my case, theComputePathThroughPoses
action request contains a list of 3 goals.Expected behavior
The
planner_server
receives and acknowledges the request, computes the path, then returns the result to thebt_navigator
.Actual behavior
The
planner_server
receives the request and computes the path, but thebt_navigator
does not seem to received the acknowledgement and ends up triggering the timeout. Here are the logs, with some additional information.Additional information
I noticed that the issue also happens with the
follow_path
action. But it never happens when thebt_navigator
triggers my custom actions.Relevant
bt_navigator
configuration.When the issue occurs, the CPU load is not especially high (~50%).
The text was updated successfully, but these errors were encountered: