Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve service and subscription reliability #92

Merged
merged 18 commits into from
Jan 19, 2024

Conversation

clalancette
Copy link
Collaborator

This rather large PR is the culmination of about a week's worth of work. The ultimate goal of this work was to improve service reliability, but along the way numerous fixes to publishers, subscriptions, clients, services, and wait sets were made.

It's probably helpful to read the description of each of the commits, but roughly:

  • We switch to unique_ptrs for dealing with the server incoming requests and the client incoming response (this was suggested in the review for Adds base support for services. #86, and implemented now).
  • We switch to RAII for the query replies, responses, and subscription data, which means we clean up in more paths.
  • We remove a ton of debugging statements.
  • We add in source_timestamps and writer_guid to clients and services.
  • We clean up a bunch of memory leaks.
  • We revamp how rmw_wait works, ensuring that we get all of the data that is incoming.

With this PR in place, services now work reliably for me in local testing. It definitely needs more testing on larger systems, but these cleanups should get us a lot closer to having that work.

This is not strictly required, but it does make ownership
clearer.

Signed-off-by: Chris Lalancette <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
This allows us to do RAII and drop things in all paths
properly.

Signed-off-by: Chris Lalancette <[email protected]>
rmw_context_fini is called during the atexit handler,
so we can't do complex things like shutdown the session.
Instead, switch to shutting down the session during
rmw_shutdown.

Signed-off-by: Chris Lalancette <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
There are more comments in the code, but in particular this
change makes sure we deal with data that is already ready,
rather than always waiting.  With this in place, services
seem to work as promised.

Signed-off-by: Chris Lalancette <[email protected]>
Since it is a deque, it doesn't *really* matter whether
we push from the back and pull from the front, or push
from the front and pull from the back.  Switch to pushing
to the back and pulling from the front so we match what
the clients and services are doing.

Signed-off-by: Chris Lalancette <[email protected]>
@clalancette clalancette requested a review from Yadunund January 18, 2024 20:10
Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for hunting down these fixes! The added documentation in rmw_wait is very insightful!

@@ -56,15 +56,12 @@ rmw_ret_t zenoh_router_check(z_session_t session)
// Define callback
auto callback = [](const struct z_id_t * id, void * ctx) {
const std::string id_str = zid_to_str(*id);
RCUTILS_LOG_INFO_NAMED(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that there is a zid_to_str() method defined here and also within liveliness_utils.cpp. I'll ticket it to be fixed later.

@Yadunund Yadunund merged commit 0e7a380 into rolling Jan 19, 2024
5 checks passed
@delete-merged-branch delete-merged-branch bot deleted the clalancette/service-fixes branch January 19, 2024 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants