Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deactivate hardware from read/write return value #884

Merged
merged 10 commits into from
Oct 16, 2023

Conversation

fmauch
Copy link
Contributor

@fmauch fmauch commented Jan 3, 2023

This PR aims at adding the possibility to trigger transitions between active and inactive for writing interfaces only. I discussed this with @destogl and this is my first implementation that should help further explain my intention.

The basic idea is that many hardware controllers require some form of activation or interpreter code running on the hardware controller in order to control the hardware. However, streaming data from the robot does not necessarily require such interpreter script to be running. For example, in the UR driver we read data from the robot as soon as a connection to the robot is established. We can read joint data, wrench sensor information and IO status. However, when the interpreter code is not running on the robot, we cannot control the robot. I think there are many hardware components that allow reading some sort of information when the hardware is not completely running (and be it only the fact that the hardware itself is not in a running state).

To mitigate this, we have a small helper node that we call "controller stopper" that reads status information from the robot, whether the interpreter code is running and takes down controllers accordingly. It basically listens to a boolean topic and stops all controllers instead of a set of preconfigured controllers when receiving a false message. On a true message it restarts the previously stopped controllers. This has the following benefits:

  • When the interpreting program on the robot gets stopped, the JTC gets stopped and therefore the trajectory gets aborted this does
    • give the user feedback about the trajectory not being executed successfully
    • prevent the JTC from further interpolating and therefore leading to infeasible jumps / very fast motions once control is re-enabled
  • While the robot is not ready to receive control commands, trajectories sent to the driver will get rejected (basically adding the same benefits as with the last point)

Obviously, this is a rather simple method to achieve that goal which has the culprit that nothing stops users from starting controllers manually while robot control is not active...

(The above paragraph is copied from my corresponding post in ROS1 ros_control regarding the controller_stopper).

We think that it would be beneficial to handle that using lifecycle transitions inside the hardware interface than this method. Currently, we can set the hardware back to the inactive state as a whole, also deactivating all reading interfaces. With this PR I added a third return type to the read and write functions that triggers deactivating all writing interfaces while leaving the reading interfaces untouched. Controllers currently are not aware of this switch and stay active, but that would probably be a followup task (though this would need to be synchronized when / if ever this gets merged.)

I tested this together with a modified ur_robot_driver which can be found here.

I am aware that this PR is not yet in a fine state, hence the draft status. But I wanted to raise the discussion, anyway. Do you consider this an edge case for UR not relevant for other hardware components? Does this make sense from a logical point of view? Would there be any conceptual comments?

@bmagyar bmagyar requested a review from VX792 January 25, 2023 18:18
@bmagyar
Copy link
Member

bmagyar commented Jan 25, 2023

@VX792 could you please give this a review from the KUKA use-case perspective?

@VX792
Copy link
Contributor

VX792 commented Jan 28, 2023

I think something like this would be useful for the (unofficial) KUKA RoX driver too. We're also using a blocking read for our system component, and the problem was that whenever the component got deactivated when a read or write call returned with return_type::ERROR, the framework used up all available CPU power. That's because we had no active components, and the control loop didn't use sleeps, since the controller manager's update rate was bypassed.

This is a bit different from your original issue, since the equivalent of your interpreter code is running constantly on the robot. At the worst case, commands coming through the ros2 driver are ignored, but the internal service is still active.

Still, the above problem was solved in a really hacky way, and if a simple hw lifecycle change allows us to receive the latest state values from the robot without sending out any commands then I'm all for it!

@fmauch fmauch force-pushed the deactivate_hw branch 2 times, most recently from d6e6526 to b7fa535 Compare May 4, 2023 13:01
@fmauch fmauch force-pushed the deactivate_hw branch 2 times, most recently from 112fd58 to e4b03a4 Compare May 15, 2023 18:40
@codecov-commenter
Copy link

codecov-commenter commented May 15, 2023

Codecov Report

Merging #884 (a0c101e) into master (925f5f3) will decrease coverage by 2.16%.
Report is 533 commits behind head on master.
The diff coverage is 34.78%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #884      +/-   ##
==========================================
- Coverage   34.61%   32.46%   -2.16%     
==========================================
  Files          52       94      +42     
  Lines        2981    10033    +7052     
  Branches     1855     6755    +4900     
==========================================
+ Hits         1032     3257    +2225     
- Misses        310      782     +472     
- Partials     1639     5994    +4355     
Flag Coverage Δ
unittests 32.46% <34.78%> (-2.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...oller_interface/chainable_controller_interface.hpp 100.00% <100.00%> (ø)
...lude/controller_interface/controller_interface.hpp 100.00% <100.00%> (ø)
controller_interface/src/controller_interface.cpp 100.00% <100.00%> (+66.66%) ⬆️
controller_manager/src/controller_manager.cpp 37.94% <ø> (-1.77%) ⬇️
...chainable_controller/test_chainable_controller.hpp 100.00% <100.00%> (ø)
...r_manager/test/test_controller/test_controller.hpp 100.00% <100.00%> (ø)
..._interface/include/hardware_interface/actuator.hpp 100.00% <ø> (ø)
...de/hardware_interface/loaned_command_interface.hpp 100.00% <100.00%> (+14.28%) ⬆️
...re_interface/include/hardware_interface/sensor.hpp 100.00% <ø> (ø)
...re_interface/include/hardware_interface/system.hpp 100.00% <ø> (ø)
... and 69 more

... and 21 files with indirect coverage changes

@fmauch
Copy link
Contributor Author

fmauch commented May 15, 2023

I've revisited this also with a discussion I had with @destogl last week. I think this is in a state, where I'd like to discuss some things here. I've re-structured the PR such that the individual commits explain the points I'd like to discuss best.

  • In d2b0d8f I've added a third return type for read/write that would cause a transition of the respective component to INACTIVE. However, this doesn't set the hardware interfaces belonging to this to unavailable hence nothing seems to be happening, which seems confusing. If some hardware would go back to INACTIVE, I'd expect that all the interfaces (or at least the "movement"-interfaces as stated in the docs would become unavailable and all the controllers claiming those will become inactive. However, none of this happens.
  • To address this, I explicitly set commanding controllers (since the concept of a "movement" interface seems not to be fully implemented) to unavailable when this happens in a385eeb. This also involves not making command interfaces available on startup (which seems to be intended, but not implemented).
  • This obviously would be a major change in how things are handled inside ros2_control and requires adapting a lot of tests. This is not fully done, yet, as I'd first like to clarify the way forward.

The main discussion points I see are the following:

  • @destogl as far as I understood our discussion from last week, you'd like to restrict this PR basically to its current first commit. As far as I see this, this doesn't really gain much benefit as explained earlier, but maybe I am missing something.
  • To make this change fully functional it would probably be good to implement the concept of a "Hardware interfaces for movement" or more general: hardware interface only available in ACTIVE.
  • One question raised from the changes: Should the "in ACTIVE only interfaces be set to available once the hardware is ready again? I've implemented a small example using this feature, which works with setting the interfaces to unavailable and available automatically. I use a time-based approach for implementation reasons, as otherwise I would have to inject a service into the SystemInterface object.
  • Am I missing something regarding the hw_component_inactive->controller_inactive chain? With the aforementioned example my hw interfaces happily change between available/claimed and unavailable/unclaimed (given that I have compiled in the automatic re-activation) while the controller keeps staying active all the time and sends commands to the hardware interface.
  • How should we proceed with this? Should we keep this small (and maybe useless as a standalone change) and progress from there?

@mergify
Copy link
Contributor

mergify bot commented May 17, 2023

This pull request is in conflict. Could you fix it @fmauch?

@destogl
Copy link
Member

destogl commented May 31, 2023

@destogl as far as I understood our discussion from last week, you'd like to restrict this PR basically to its current first commit. As far as I see this, this doesn't really gain much benefit as explained earlier, but maybe I am missing something.

Yes, I see here multiple PR that can be added after each other. Just for simpler traceability and review. You can have a big PR implementing everything, but we merge it once featured by one.

NOTE: The points 1 and 2 can be done independently.

  1. Add READ_ONLY from read or write (does this make sense to return only from write since we can always read?)
  2. Add a new flag in the parser for the command interfaces that should be available in inactive state. So, per default everything is only available when active, but we need an additional flag to make it writable in inactive. Here also add a deprecation notice for this behavior (maybe we should also add this deprecation already into humble too.)
  3. Change logic when command interfaces have to be set to available depending on the parsed flag.
  4. Add automatic deactivation of controller when interface gets unavailable (should be implemented already). We don't do automatic activation – this is too much application dependent and cannot be generalized.

To make this change fully functional it would probably be good to implement the concept of a "Hardware interfaces for movement" or more general: hardware interface only available in ACTIVE.

yes. As described above.

Should the "in ACTIVE only interfaces be set to available once the hardware is ready again?

Yes, if (command) interfaces don't have flag "available when inactive" set then they should be only available when hardware is in active state.

Am I missing something regarding the hw_component_inactive->controller_inactive chain? With the aforementioned example my hw interfaces happily change between available/claimed and unavailable/unclaimed (given that I have compiled in the automatic re-activation) while the controller keeps staying active all the time and sends commands to the hardware interface.

There might be a bug. The controller should be deactivated. I know I was implementing this, but maybe missed something...

How should we proceed with this? Should we keep this small (and maybe useless as a standalone change) and progress from there?

I would go with the 4 PR as proposed above. You can create a big one first will full functionality and then slice it into smaller chunks for faster review and merging. So we are also sure that everything is well tested.

@MarqRazz
Copy link

Before we consider this PR can we please include some documentation of the state transitions the controller_manager and controllers would go through before and after this change? We have a really nice diagram for the hardware_interface but how they are managed from above is a little bit of a black box to me.
image

Here is a proposal I am working on for Faults and Hardware Status that I would appreciate any input you might have because it very much fits into this use case.

This is the first step to deactivate a hw component when writing to that
component stops working (e.g. because of some interpreter program stopped
running).

The idea is that a hw component can go back to "configured", where the
communication is working, but commanding the robot is not possible.
@fmauch
Copy link
Contributor Author

fmauch commented Jul 13, 2023

I've finally finished this (rather small) PR from my point of view.

This now

  • can return DEACTIVATE (as discussed in the last WG meeting) from components, with directly deactivates the hardware
  • has tests in the hardware_interface to simulate this

I didn't change any documentation so far and I didn't add tests inside the controller_manager package. If any of this is desired, please let me know

Inside the tests there exist special values to simulate an error in the hardware.
These magic numbers were used and defined at different places. This creates
one common place to define those. This should make maintenance more stable
and the tests easier to understand for new developers.
Basically, the hardware tells the resource_manager to deactivate the port,
so the imperative is probably the better wording.
Copy link
Member

@destogl destogl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few details we need to review and fix.

The major thing is the placement of the header file with error codes. Besides that, I propose to add a few comments to increase the readability of the test.

Comment on lines 1260 to 1263
using lifecycle_msgs::msg::State;
rclcpp_lifecycle::State state(
State::PRIMARY_STATE_INACTIVE, lifecycle_state_names::INACTIVE);
set_component_state(component.get_name(), state);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call directly deactivate_hardware from line 319 to avoid unnecessary iterations, searches and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since at this point we don't really have the Hardware datatype we cannot call this directly. Getting this would be duplication of code from lines 1138-1191 which is the main part of ResourceManager::set_component_state. I don't quite understand your suggestion here @destogl , I'm afraid.

Comment on lines 1217 to 1220
using lifecycle_msgs::msg::State;
rclcpp_lifecycle::State state(
State::PRIMARY_STATE_INACTIVE, lifecycle_state_names::INACTIVE);
set_component_state(component.get_name(), state);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check comment below

hardware_interface/test/test_resource_manager.cpp Outdated Show resolved Hide resolved
@bmagyar
Copy link
Member

bmagyar commented Aug 23, 2023

ping @fmauch

@fmauch
Copy link
Contributor Author

fmauch commented Aug 23, 2023

Sorry for not being responsive ATM. I am currently on vacation struggling to find the time to do OSS development. Will do this until Tuesday!

@destogl destogl self-requested a review October 16, 2023 16:11
Copy link
Member

@destogl destogl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmauch thanks for the following up. The last part I have updated directly.

@bmagyar bmagyar added the backport-humble This label should be used by maintaines only! Label triggers PR backport to ROS2 humble. label Oct 16, 2023
@destogl destogl merged commit bd6911d into ros-controls:master Oct 16, 2023
18 checks passed
@destogl
Copy link
Member

destogl commented Oct 16, 2023

@Mergifyio backport humble

@mergify
Copy link
Contributor

mergify bot commented Oct 16, 2023

backport humble

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Oct 16, 2023
)

* Deactivate hardware from read/write return value
* Added tests to DEACTIVATE return value
* Use constants for defining of special test-values for different HW behaviors

---------

Co-authored-by: Dr. Denis <[email protected]>
(cherry picked from commit bd6911d)

# Conflicts:
#	controller_manager/test/test_controller_manager_hardware_error_handling.cpp
#	hardware_interface/src/resource_manager.cpp
#	hardware_interface/test/test_components/test_actuator.cpp
#	hardware_interface/test/test_components/test_system.cpp
#	hardware_interface/test/test_resource_manager.cpp
@fmauch fmauch deleted the deactivate_hw branch October 16, 2023 18:00
@flochre
Copy link
Contributor

flochre commented Oct 21, 2023

Hi @fmauch,

I am trying this new feature, and I was expecting my controller to not publish ros_topics after I desactivate a component using a controller. But data are still being read from my sensors and being publish.

Is this the expected behavior?

If yes, then I misunderstood the point of the PR.

Thank you in advance.

PS: I am not sure if I should post this here or write a new issue..

christophfroehlich added a commit to christophfroehlich/ros2_control that referenced this pull request Nov 16, 2023
christophfroehlich added a commit to christophfroehlich/ros2_control that referenced this pull request Nov 28, 2023
pac48 pushed a commit to pac48/ros2_control that referenced this pull request Jan 26, 2024
…ls#922)

* Update list of reviewers (ros-controls#884)

(cherry picked from commit 0a96cbb513282880d2d64adfb8c3c8c0ffc84b5b)

# Conflicts:
#	.github/reviewer-lottery.yml

* Update reviewer-lottery.yml

---------

Co-authored-by: Christoph Fröhlich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-humble This label should be used by maintaines only! Label triggers PR backport to ROS2 humble.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants