Timed out while waiting for action server to acknowledge goal request for compute_path_through_poses (or follow_path) #4278

DylanDeCoeyer-Quimesis · 2024-04-25T16:01:49Z

Bug report

Required Info:

Operating System:
- Custom Linux image based on Linux for Tegra (L4T) R32.7.4
ROS2 Version:
- Iron
Version or commit hash:
- 2024-02-09
DDS implementation:
- Fast-RTPS

Steps to reproduce issue

The issue occurs randomly when the bt_navigator sends a goal to the compute_path_through_poses action provided by the planner_server node. The BT node used to send the request is ComputePathThroughPosesAction. In my case, the ComputePathThroughPoses action request contains a list of 3 goals.

Expected behavior

The planner_server receives and acknowledges the request, computes the path, then returns the result to the bt_navigator.

Actual behavior

The planner_server receives the request and computes the path, but the bt_navigator does not seem to received the acknowledgement and ends up triggering the timeout. Here are the logs, with some additional information.

Apr 25 13:22:17.620047 [bt_navigator-12] [INFO]: Path is obsolete, computing a new one
Apr 25 13:22:17.624897 [planner_server-9] [INFO]: Attempting to find a path from (8.61, 2.92) to (4.99, 3.79).
Apr 25 13:22:17.625304 [planner_server-9] [INFO]: SmacPlannerHybrid: Start pose in world: (8.612866, 2.919776), in costmap (96, 39)
Apr 25 13:22:17.625636 [planner_server-9] [INFO]: SmacPlannerHybrid: Goal pose in world: (4.992264, 3.793334), in costmap (59, 47)
Apr 25 13:22:17.625954 [planner_server-9] [INFO]: Attempting to find a path from (4.99, 3.79) to (3.01, 3.79).
Apr 25 13:22:17.626311 [planner_server-9] [INFO]: SmacPlannerHybrid: Start pose in world: (4.992264, 3.793334), in costmap (59, 47)
Apr 25 13:22:17.626693 [planner_server-9] [INFO]: SmacPlannerHybrid: Goal pose in world: (3.015000, 3.793334), in costmap (40, 47)
Apr 25 13:22:17.627004 [planner_server-9] [INFO]: Attempting to find a path from (3.01, 3.79) to (1.01, 5.79).
Apr 25 13:22:17.627244 [planner_server-9] [INFO]: SmacPlannerHybrid: Start pose in world: (3.015000, 3.793334), in costmap (40, 47)
Apr 25 13:22:17.628079 [planner_server-9] [INFO]: SmacPlannerHybrid: Goal pose in world: (1.015000, 5.793334), in costmap (20, 67)
Apr 25 13:22:17.669789 [bt_navigator-12] [WARN]: Behavior tree engine cycle is slower than expected: 59 ms (timeout is 50 ms)
Apr 25 13:22:17.670153 [bt_navigator-12] [WARN]: Time required to tick the tree is 58 ms
Apr 25 13:22:17.702070[bt_navigator-12] [WARN]: Timed out while waiting for action server to acknowledge goal request for compute_path_through_poses
Apr 25 13:22:17.707512 [bt_navigator-12] [WARN]: Behavior tree engine cycle is slower than expected: 50 ms (timeout is 50 ms)

Additional information

I noticed that the issue also happens with the follow_path action. But it never happens when the bt_navigator triggers my custom actions.

Relevant bt_navigator configuration.

bt_navigator:
  ros__parameters:
    default_server_timeout: 100
    bt_loop_duration: 50

When the issue occurs, the CPU load is not especially high (~50%).

The text was updated successfully, but these errors were encountered:

SteveMacenski · 2024-04-25T21:00:56Z

I don't have a good answer for you about this, I have not seen this and this isn't a known cause for any of the flakiness in our system testing (which we call these servers regularly from)

I'm curious if its related to your BT tick rate, you're running it at 20 hz so maybe its too slow or you need to increase the timeout? Have you tried this with other DDS vendors like Cyclone and still see it occurring?

DylanDeCoeyer-Quimesis · 2024-04-26T16:10:04Z

@SteveMacenski thanks for your answer. I forgot to mention that we use a discovery server.

I tried to switch to CycloneDDS, but encountered the following error : ros2/rmw_cyclonedds#458 . The proposed solution does not work for me at the moment. I'll keep investigating on monday.

DylanDeCoeyer-Quimesis · 2024-05-02T08:54:01Z

Hi, I finally managed to fix the Failed to find a free participant index for domain 0 error by using the proposed solution.

Our robot has been running with CycloneDDS for the past 2 days, and the action acknowledgment timeout error has not occurred once. This is very encouraging since it used to occur a few times per day with Fast-DDS.

@SteveMacenski Regarding your remark about the BT tick rate, why would it be too slow ? I don't see a direct link with the observed timeout. Also, the chosen value of 100ms for the timeout seems already high to me. A normal action acknowledgement is usually received within a few ms. I don't think that increasing it will help.

SteveMacenski · 2024-05-02T16:41:56Z

Interesting. You may want to file a ticket with RMW Fast DDS (https://github.com/ros2/rmw_fastrtps) that some current behavior has caused you an issue and link back here. I'm sure they're going to ask you to come up with some more minimal test they can reproduce so they can dig into it when they have the cycles.

I didn't know if one, other, both, or neither was the cause of the issue, I was giving you some options of things to test to see what made a behavioral change that I could dig into :-) It seems that the DDS or RMW is a likely cause so its worth making folks over there aware of this in case it strikes a chord.

You can safely ignore the tick rate stuff! I think that largely closes this ticket out for Nav2? Not much we can do on the RMW front unfortunately over here.

alanxuefei · 2024-11-24T10:18:10Z

We encountered a similar issue, which may be related to the real-time capability or configuration of DDS. The solution is still under investigation. Increasing the timeout (e.g., to 5 seconds) may not solve the issue, as it could still occur, and a 5-second delay may be too long for certain actions (e.g., the controller).

rclcpp/rclcpp_action/src/server.cpp

void
ServerBase::execute_goal_request_received()
{
  ...
  auto response_pair = call_handle_goal_callback(uuid, message);  
  {
    std::lock_guard<std::recursive_mutex> lock(pimpl_->action_server_reentrant_mutex_);
    /**********************
     If the client receives the acknowledgment after a timeout, the client will assume the request has failed. 
     However, the server will still proceed with executing call_goal_accepted_callback, which initiates the worker thread to process the goal.
    **********************/
    ret = rcl_action_send_goal_response(  
      pimpl_->action_server_.get(),
      &request_header,
      response_pair.second.get());
  }

  ...
  if (GoalResponse::ACCEPT_AND_EXECUTE == status || GoalResponse::ACCEPT_AND_DEFER == status) {
    
    ...
    call_goal_accepted_callback(handle, uuid, message);
  }
}

If we look at the service, it runs a callback that may trigger a thread before sending a response. However, it may encounter a similar issue.

rclcpp/rclcpp/include/rclcpp/service.hpp

  void
  handle_request(
    std::shared_ptr<rmw_request_id_t> request_header,
    std::shared_ptr<void> request) override
  {
    auto typed_request = std::static_pointer_cast<typename ServiceT::Request>(request);
    auto response = any_callback_.dispatch(this->shared_from_this(), request_header, typed_request);
    if (response) {
      send_response(*request_header, *response);
    }
  }

DylanDeCoeyer-Quimesis closed this as completed May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timed out while waiting for action server to acknowledge goal request for compute_path_through_poses (or follow_path) #4278

Timed out while waiting for action server to acknowledge goal request for compute_path_through_poses (or follow_path) #4278

DylanDeCoeyer-Quimesis commented Apr 25, 2024 •

edited

Loading

SteveMacenski commented Apr 25, 2024

DylanDeCoeyer-Quimesis commented Apr 26, 2024

DylanDeCoeyer-Quimesis commented May 2, 2024

SteveMacenski commented May 2, 2024

alanxuefei commented Nov 24, 2024 •

edited

Loading

Timed out while waiting for action server to acknowledge goal request for compute_path_through_poses (or follow_path) #4278

Timed out while waiting for action server to acknowledge goal request for compute_path_through_poses (or follow_path) #4278

Comments

DylanDeCoeyer-Quimesis commented Apr 25, 2024 • edited Loading

Bug report

Steps to reproduce issue

Expected behavior

Actual behavior

Additional information

SteveMacenski commented Apr 25, 2024

DylanDeCoeyer-Quimesis commented Apr 26, 2024

DylanDeCoeyer-Quimesis commented May 2, 2024

SteveMacenski commented May 2, 2024

alanxuefei commented Nov 24, 2024 • edited Loading

DylanDeCoeyer-Quimesis commented Apr 25, 2024 •

edited

Loading

alanxuefei commented Nov 24, 2024 •

edited

Loading