Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp of rmw_context_impl_s #307

Merged
merged 11 commits into from
Nov 12, 2024
Merged

Conversation

clalancette
Copy link
Collaborator

This PR is an attempt to fix two particular issues surrounding rmw_context_impl_s:

  1. It could be the case that a graph_sub_data_handler call could attempt to access a Data pointer that had already been freed.
  2. Sometimes on shutdown, we would end up in a deadlock situation where graph_sub_data_handler was being called by Zenoh while Data::shutdown was trying to destroy the session. This would end up in an AB/BA deadlock.

It's probably best to review this PR patch-by-patch, as this will show you how I got to this solution, but in short:

  1. We first make rmw_context_impl_s::Data private within rmw_context_impl_s.cpp. This isn't strictly required, but it does form more of a PIMPL pattern and is tidier.
  2. We remove the unnecessary Data::subscribe_to_ros_graph method and just put the code inline.
  3. We remove the unused Data::is_initialized_ member.
  4. We remove the unused Data::allocator_ member.
  5. We stop storing Data::liveliness_str_; we only need it during the constructor.
  6. We mark the Data structure as final, and remove enable_shared_from_this (we don't currently need it).
  7. We keep a global map of Data structure pointers to Data structure shared_ptr. We give the Data structure pointer as a "key" to graph_sub_data_handler, but it is strictly used as a key to look up the shared_ptr. Once we have the shared_ptr, we are guaranteed that the structure will not be deleted while we are using it.
  8. Possibly most controversially, we move all work on the Data members within Data, mark Data as a class (rather than a structure), and mark all of the member variables private. This makes it crystal-clear which entity is responsible for which. With this in place, rmw_context_impl_s is a thin wrapper over Data, that really only passes through calls and manages the addition and removal of Data from the global map.

With all of this in place, I can no longer reproduce the crash that I was seeing locally. Also, with this and with ros2/rclpy#1353 in place, I am down to a single reproducible failing test across [rcl, rcl_action, rcl_lifecycle, rclcpp, rclcpp_action, rclcpp_lifecycle, rclpy, test_rclcpp, test_communication].

That way we can completely hide the implementation in
rmw_context_impl_s.cpp.

Signed-off-by: Chris Lalancette <[email protected]>
We can just do this work right in the constructor, which
simplifies it.

Signed-off-by: Chris Lalancette <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
We only need it during construction, so only use it
there.

Signed-off-by: Chris Lalancette <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
What we do here is to create a global map between a
Data pointer, and its shared_ptr equivalent.  When we
create a new Data ptr via rmw_context_impl_s, we add
it to the map.  When we delete a Data ptr via rmw_context_impl_s,
we remove it from the map.  When we get a graph callback,
we look to see if the pointer is in the map; if so, we
get the shared_ptr.  Because of this, we are guaranteed
that the Data pointer will not be removed while we are
using it.

Signed-off-by: Chris Lalancette <[email protected]>
In particular, since all of the data is owned by
Data, move all of the functionality into there as well.
Then it is clear who owns all of it, instead of it being
split across rmw_context_impl_s and Data.

Signed-off-by: Chris Lalancette <[email protected]>
@ahcorde
Copy link
Contributor

ahcorde commented Nov 8, 2024

I added in this comment a summary of test failures.

Signed-off-by: Chris Lalancette <[email protected]>
Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up! Left some feedback to consider!

rmw_zenoh_cpp/src/detail/rmw_context_impl_s.hpp Outdated Show resolved Hide resolved
// Bundle all class members into a data struct which can be passed as a
// weak ptr to various threads for thread-safe access without capturing
// "this" ptr by reference.
class Data final
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: For better readability, can we first define Data and all its members here first before we implement them later in this file?

///=============================================================================
class rmw_context_impl_s::Data
{
public:
  Data(...);

  rmw_ret_t shutdown();
  ...
private:
  z_owned_session_t session_;
  ...
};

///=============================================================================
rmw_context_impl_s::Data::Data(...)
{
...
}

That should also minimize the diff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can do that. I generally find that unnecessarily verbose, because you are doing the same thing but with more lines.

That said, I don't feel super strongly about it. Let me know if you really prefer that style and I can switch to it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong requirement from my end

Signed-off-by: Chris Lalancette <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
@Yadunund Yadunund merged commit 00d8554 into rolling Nov 12, 2024
8 checks passed
@Yadunund Yadunund deleted the clalancette/cleanup-rmw-context-impl branch November 12, 2024 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants