Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Function Handler for Controllers #1489

Merged
merged 91 commits into from
Oct 17, 2024

Conversation

saikishor
Copy link
Member

@saikishor saikishor commented Apr 18, 2024

Hello!

In this PR, I've added a specific class AsyncFunctionHandler which basically will help us run the update method in the case of the controller manager and read/write method in the case of hardware components asynchronously. This PR takes a bit different approach compared to PR : #1041. The idea behind both approaches is the same. The key objectives of this approach include:

  • The controller manager functions solely as a scheduler, with each controller/hardware component responsible for managing its own threads.
  • Enhancing modularity by enabling the reuse of the same class and logic across controllers and hardware components.
  • Ensuring proper isolation for testing of the asynchronous handling aspect.
  • This might break ABI but is backward compatible.

I would like to hear your opinion on this approach. @VX792 It would be great if you could take a look at this approach and comment with your opinion.

Thank you!

Needs: ros-controls/realtime_tools#168
Closes #1469

Copy link
Contributor

@VX792 VX792 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concept is pretty cool, especially because this is what I also think would be intuitive from the user's point of view.
For the technical side I left a few comments / questions for a few things that are not yet clear to me, but I think the overall approach is good. The only thing I'm not entirely convinced about is the synchronization part, because I think it would make the most sense inside the handles.

As an example, instead of triggering the entire async controller update, it would be better to trigger only the set_ and get_value() operations that the update executes; that way we could block for a very minimal amount of time.

Also, if we're going with this implementation, then feel free to delete the unused ControllerThreadWrapper and the other leftover stuff from the controller manager's header too.

{
break;
}
async_update_return_ = async_function_(current_update_time_, current_update_period_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the issue I wasn't able to solve without rewriting the handles. Do you have an idea on how we could make this thread safe?

This calls the controller update(), and if it coincides with the CM thread's write or read, then this is a data race, as both of those write to the same memory through a double ptr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current version, I agree we will have this issue. WIth #1240 we might be able to avoid this issue with the getter and setters that it is going to provide

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not sure even that will solve it, because if you take this code from the JTC for example:

    for (size_t index = 0; index < dof_; ++index)
    {
      trajectory_point_interface[index] = joint_interface[index].get().get_value(); // get_state() from #1240
    }

and you're in the async update cycle, while the HWIF from the CM thread does

    for (size_t index = 0; index < dof_; ++index)
    {
      hw_states_[index] = some_value // set_state() with the updated version
    }

at the same time, then it doesn't matter if the access itself is thread safe, the async JTC will do the assignment and calculations based on a mixture of old and new state interfaces.

For async HW, maybe it could be possible to disable plain getters / setters and instead use a grouped set_states and get_states function? That way you could wait until all values are set by the JTC, and update the states only after that! (e.g. lock until all states are read, or all states are set)

As a quick example something like this could easily be extended for multiple types:

std::map<const char*, double> interface_name_and_value = { {"Chocolate", 12.0}, {"Coffee", 14.0}, {"Cake", 15.0}};

template<typename T>
struct my_pair {
    my_pair() = delete;
    my_pair(const char* first, T second) : first_(first), second_(second) {}
    const char* first_;
    T second_;
};

template <typename T, typename ...Pairs>
void fun(const my_pair<T>& first_pair, const Pairs&... pairs) {

   interface_name_and_value[first_pair.first_] = first_pair.second_;
   if constexpr (sizeof...(pairs) > 0)
   {
     fun(pairs...);
   }
}

//    Usage:
//    const my_pair p1 = {"Chocolate", 18.0};
//    const my_pair p2 = {"Cake", 15.2};
//    const my_pair p3 = {"Coffee", 198.2};

//    lock();
//    fun(p1, p2, p3);
//    unlock()

Plain folds are not be enough if the data is stored in some data structure though, but I think this could work with index sequences / recursive templates.

What do you think about this? The code might not be very readable, but I think this could ensure data consistency later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this, I agree, that there might be cases where the data might not be perfectly synchronized, this is the case we are more or less talking about in the other thread. For now, we can print the warning message as you mentioned and we shall try to fix it in future versions.

The batch reading and updating is a good idea, but I don't know if the Maintainers allow that level of changes at this point before Jazzy, we can try to do something like that in the future, so we could have some data consistency.

What do you say?

/**
* If the async method is running, it will wait for the current async method call to finish.
*/
void wait_for_trigger_cycle_to_finish()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a placeholder? I haven't seen this fn called anywhere other than the tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it is for testing purpose, but if can be used alongside before stopping the async thread. If I use this at the beginning of the stop_async_update, may be we won't be having the situation you were describing

* If the async update method is still running, it will return the last return value of the async
* function.
*
* \note In the case of controllers, The controller manager is responsible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having the controller manager act as the scheduler, would it make sense to trigger the async updates based on whether the hardware component whose command interfaces the controller is using has any data ready?

In the latter case, it might be possible to make async HW and controllers work together!

As an example, let's say I'd like to use an async controller combined with a KUKA-specific external interface, where the timing is always handled inside the internal motion stack instead of the CM, and therefore would work better as an async HW component.

I'm not yet sure if there's a legit use case where this would make sense, I think it's just interesting to think about if the design we choose should permit the combination of the two or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design process would be cool. I'm not sure about triggering only when the data is ready.

For instance, having a chainable controller A -> B -> HW. A is async and running at a lower rate like 200 Hz and controller B is async and running at 500 Hz closing the loop locally with the actual values (So, we will need to trigger the controller B anyway as it needs to close the loop with the same reference values) and then command to the HW on each loop, if the HW is async then it gets the references on it's cycle (Some hardware like elmo respectively they expect a reference sent constantly at a certain frequency, in that case this mechanism would also be a problem right?

Right now, I couldn't think of a way to make both HW and controllers work together for both cases. It would be really cool as we have a data dependency

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense.

Also it's a difficult question because afaik there is no definition on what the behaviour of an async HW and async controller combination should be, so maybe at least a warning could be given if the user tries to use this combination, because otherwise there's no information if a certain configuration is invalid or not.

What do you think about that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is always nice to warn the user about the data synchronization issue between async controllers and async hardware combination. Where do you think is the right place to do it?. The resource manager at the time of configuring the hardware can print this message of possible discrepancies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in most of the case, the controller manager runs at almost same frequency as the hardware even though it is run async, because mostly it doesn't make sense to compute the same command twice and being unable to send to the hardware.

async_update_condition_.notify_one();
trigger_status = true;
}
const T return_value = async_update_return_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think std::future would be more idiomatic here, instead of a condition variable? As I see we're doing some calculations in the async thread, and returning it when it's ready. I think we'd also have one less flag to keep track of.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought of using future, but if there is a case that the update takes time and it doesn't have it's current update cycle's return state, then I think it makes sense to continue with its old return state without triggering another cycle, rather than waiting for the future result.

If this above situation occurs, then the controller manager will print a missing update cycle message, so the user knows that he has to lower the frequency of his controller as the computation is taking longer than what he initially estimated

Copy link
Contributor

@VX792 VX792 Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair.

Maybe it's possible to compute a timeout using the time & period for the future's wait_for() / wait_until() fn but probably it's needless complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought so, but I rather prefer that the conditional variable is not awake after a certain duration, because that's exactly what we want to avoid with proper triggering

Copy link
Member Author

@saikishor saikishor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VX792 Thank you for the extensive review. I've answered the comments

The only thing I'm not entirely convinced about is the synchronization part, because I think it would make the most sense inside the handles.

I'm expecting to achieve some sort of synchronization with #1240 and also triggering this way, if we have a chainable controller, the async will be triggered in the same order, so some sort of synchronization would be achieved better than the current one

As an example, instead of triggering the entire async controller update, it would be better to trigger only the set_ and get_value() operations that the update executes; that way we could block for a very minimal amount of time.

How do we deal with the cases, where the update cycle of a controller take longer?. For instance, the modern techniques like Model Predictive Control, Whole Body Control etc runs some optimization in the update cycle and this time entirely depends on the objective or constraints they are solving along with the degrees of freedom of the robot

Also, if we're going with this implementation, then feel free to delete the unused ControllerThreadWrapper and the other leftover stuff from the controller manager's header too.

Okay, first I wanted to get the feedback if this approach is interesting for the community or not and in a different PR I could remove them so we have less changes in the PR to review

{
break;
}
async_update_return_ = async_function_(current_update_time_, current_update_period_);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current version, I agree we will have this issue. WIth #1240 we might be able to avoid this issue with the getter and setters that it is going to provide

/**
* If the async method is running, it will wait for the current async method call to finish.
*/
void wait_for_trigger_cycle_to_finish()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it is for testing purpose, but if can be used alongside before stopping the async thread. If I use this at the beginning of the stop_async_update, may be we won't be having the situation you were describing

async_update_condition_.notify_one();
trigger_status = true;
}
const T return_value = async_update_return_;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought of using future, but if there is a case that the update takes time and it doesn't have it's current update cycle's return state, then I think it makes sense to continue with its old return state without triggering another cycle, rather than waiting for the future result.

If this above situation occurs, then the controller manager will print a missing update cycle message, so the user knows that he has to lower the frequency of his controller as the computation is taking longer than what he initially estimated

* If the async update method is still running, it will return the last return value of the async
* function.
*
* \note In the case of controllers, The controller manager is responsible
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design process would be cool. I'm not sure about triggering only when the data is ready.

For instance, having a chainable controller A -> B -> HW. A is async and running at a lower rate like 200 Hz and controller B is async and running at 500 Hz closing the loop locally with the actual values (So, we will need to trigger the controller B anyway as it needs to close the loop with the same reference values) and then command to the HW on each loop, if the HW is async then it gets the references on it's cycle (Some hardware like elmo respectively they expect a reference sent constantly at a certain frequency, in that case this mechanism would also be a problem right?

Right now, I couldn't think of a way to make both HW and controllers work together for both cases. It would be really cool as we have a data dependency

What do you think?

Copy link

codecov bot commented Apr 21, 2024

Codecov Report

Attention: Patch coverage is 92.17391% with 9 lines in your changes missing coverage. Please review.

Project coverage is 87.62%. Comparing base (83fff77) to head (75754de).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...ontroller_manager/test/test_controller_manager.cpp 94.36% 4 Missing ⚠️
...roller_interface/src/controller_interface_base.cpp 87.50% 1 Missing and 2 partials ⚠️
controller_manager/src/controller_manager.cpp 86.66% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1489      +/-   ##
==========================================
+ Coverage   87.33%   87.62%   +0.28%     
==========================================
  Files         121      120       -1     
  Lines       12140    12217      +77     
  Branches     1086     1093       +7     
==========================================
+ Hits        10603    10705     +102     
+ Misses       1148     1123      -25     
  Partials      389      389              
Flag Coverage Δ
unittests 87.62% <92.17%> (+0.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...controller_interface/controller_interface_base.hpp 92.30% <100.00%> (+2.30%) ⬆️
.../include/controller_manager/controller_manager.hpp 100.00% <ø> (ø)
...r_manager/test/test_controller/test_controller.cpp 95.74% <100.00%> (+0.18%) ⬆️
controller_manager/src/controller_manager.cpp 78.17% <86.66%> (+0.60%) ⬆️
...roller_interface/src/controller_interface_base.cpp 86.07% <87.50%> (+0.36%) ⬆️
...ontroller_manager/test/test_controller_manager.cpp 96.36% <94.36%> (-0.23%) ⬇️

async_update_condition_.notify_one();
trigger_status = true;
}
const T return_value = async_update_return_;
Copy link
Contributor

@VX792 VX792 Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair.

Maybe it's possible to compute a timeout using the time & period for the future's wait_for() / wait_until() fn but probably it's needless complexity.

* If the async update method is still running, it will return the last return value of the async
* function.
*
* \note In the case of controllers, The controller manager is responsible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense.

Also it's a difficult question because afaik there is no definition on what the behaviour of an async HW and async controller combination should be, so maybe at least a warning could be given if the user tries to use this combination, because otherwise there's no information if a certain configuration is invalid or not.

What do you think about that?

{
break;
}
async_update_return_ = async_function_(current_update_time_, current_update_period_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not sure even that will solve it, because if you take this code from the JTC for example:

    for (size_t index = 0; index < dof_; ++index)
    {
      trajectory_point_interface[index] = joint_interface[index].get().get_value(); // get_state() from #1240
    }

and you're in the async update cycle, while the HWIF from the CM thread does

    for (size_t index = 0; index < dof_; ++index)
    {
      hw_states_[index] = some_value // set_state() with the updated version
    }

at the same time, then it doesn't matter if the access itself is thread safe, the async JTC will do the assignment and calculations based on a mixture of old and new state interfaces.

For async HW, maybe it could be possible to disable plain getters / setters and instead use a grouped set_states and get_states function? That way you could wait until all values are set by the JTC, and update the states only after that! (e.g. lock until all states are read, or all states are set)

As a quick example something like this could easily be extended for multiple types:

std::map<const char*, double> interface_name_and_value = { {"Chocolate", 12.0}, {"Coffee", 14.0}, {"Cake", 15.0}};

template<typename T>
struct my_pair {
    my_pair() = delete;
    my_pair(const char* first, T second) : first_(first), second_(second) {}
    const char* first_;
    T second_;
};

template <typename T, typename ...Pairs>
void fun(const my_pair<T>& first_pair, const Pairs&... pairs) {

   interface_name_and_value[first_pair.first_] = first_pair.second_;
   if constexpr (sizeof...(pairs) > 0)
   {
     fun(pairs...);
   }
}

//    Usage:
//    const my_pair p1 = {"Chocolate", 18.0};
//    const my_pair p2 = {"Cake", 15.2};
//    const my_pair p3 = {"Coffee", 198.2};

//    lock();
//    fun(p1, p2, p3);
//    unlock()

Plain folds are not be enough if the data is stored in some data structure though, but I think this could work with index sequences / recursive templates.

What do you think about this? The code might not be very readable, but I think this could ensure data consistency later.

@saikishor
Copy link
Member Author

@VX792 Will you be in today's WG meeting?

@VX792
Copy link
Contributor

VX792 commented Apr 24, 2024

@VX792 Will you be in today's WG meeting?

I'm not sure yet,but I think I can make it.

@saikishor
Copy link
Member Author

I'm not sure yet, but I think I can make it.

@VX792 Sure. It's just so we can discuss this or the async topic in general. Thank you :)

@saikishor saikishor changed the title Async Function Handler for Controllers and Hardware Components Async Function Handler for Controllers Jun 13, 2024
Copy link
Contributor

mergify bot commented Aug 5, 2024

This pull request is in conflict. Could you fix it @saikishor?

std::bind(controller_name_compare, std::placeholders::_1, controller));
if (controller_it != controllers.end())
{
controller_it->c->wait_for_trigger_update_to_finish();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot this hang for an undetermined amount of time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, this is just until the current cycle is finished.

it might hang only when there is a crash or throw exception in another thread. That's why I added this one : ros-controls/realtime_tools#172

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a bit tricky, as it can easily disturb the update loop. for not is this fine. But I would love to see "async" deactivation of an async controller, or at least that we somehow earlier know what is happening.

The scenario I have in mind. My async controller is probably slow, so I would expect it to be blocking when writing for it to finish. which means that this would break the update loop here. I am not fully sure how to avoid this TBH, of how to manage it properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well but this is inside the nonRT loop, it is inside the switch_controllers method, so waiting there shouldn't be a problem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be interesting is to skip the update trigger for the deactivated controllers to avoid it blocking

@@ -2404,6 +2398,15 @@ controller_interface::return_type ControllerManager::update(
failed_controllers_list.push_back(loaded_controller.info.name);
ret = controller_ret;
}
else if (!trigger_status)
{
RCLCPP_WARN(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RCLCPP_WARN(
RCLCPP_WARN(

how often does this trigger in a realistic setup? if something is off by a bit this can massively spam the console... should it be THROTTLED?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends this can help you see, if your controller is missing the deadline, may be this can be throttled!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it would be nice if not only the log level but the throttling ratio was also controllable for logging

bmagyar
bmagyar previously approved these changes Oct 17, 2024
destogl
destogl previously approved these changes Oct 17, 2024
Copy link
Member

@destogl destogl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

<build_export_depend>rclcpp_lifecycle</build_export_depend>

<exec_depend>realtime_tools</exec_depend>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future: why not just to have this once using <depend> tag?

std::bind(controller_name_compare, std::placeholders::_1, controller));
if (controller_it != controllers.end())
{
controller_it->c->wait_for_trigger_update_to_finish();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a bit tricky, as it can easily disturb the update loop. for not is this fine. But I would love to see "async" deactivation of an async controller, or at least that we somehow earlier know what is happening.

The scenario I have in mind. My async controller is probably slow, so I would expect it to be blocking when writing for it to finish. which means that this would break the update loop here. I am not fully sure how to avoid this TBH, of how to manage it properly.

@destogl destogl dismissed stale reviews from bmagyar and themself via 00228c4 October 17, 2024 11:23
@destogl
Copy link
Member

destogl commented Oct 17, 2024

Thanks for the last fix! This is much appriciated!

@destogl
Copy link
Member

destogl commented Oct 17, 2024

I will merge this, as it works great!

@destogl destogl merged commit d55def1 into ros-controls:master Oct 17, 2024
17 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants