-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subscribers sometimes not getting messages from topics using durability transient_local #263
Comments
Thanks for the ticket and the instructions to reproduce. I will try it out and report back. |
I'm able to reproduce the issue. @JEnoch @imstevenpmwork I'm trying to understand if this is an issue with I ran the test case with Specifically looking at the logs relevant to
and here are the logs from the unsuccessful run
Does anything stand out to you? I'm looking into whether the subscription in the rmw side receives the payload but is failing to notify the guard_condition or if rmw_take is missing the notification.... |
I added a line to sub_data_handler to print a statement when a message is received. When the tests pass, I do see this line printed 1: [subscription_test_exe-1] [INFO] [1724286701.114562488] [rmw_zenoh_cpp]: Sub /test_topic received a message However, there is no printout when the test fails. Could this suggest that the issue lies in Zenoh? Or perhaps the options we define for the querying subscriber? |
I think it's an issue of "publication before subscriber discovery":
In PublicationCache / QueryingSubscriber pattern, in addition of its first query at creation, a QueryingSubscriber shall re-issue a Query on any discovered PublicationCache. Thus it get historical publications that it might have missed before mutual discovery, or in case of network partition. |
I had assumed this call is implicitly made when the QueryingSubscriber is created. Is that not the case? |
No it isn't. The PublicationCache and QueryingSubscriber have been implemented before the LivelinessTokens exist. Now, we can consider for a next version of Zenoh to improve the PublicationCache to automatically declare a LivelinessToken, and the QueryingSubscriber to automatically query a PublicationCache when it discovers its LivelinessToken. |
Thanks Julien for the additional clarification. I now realize that the issue here is publisher discovery after first publication. I will update the implementation in |
sequenceDiagram
participant Pub as Pub
participant Cache as Cache
participant Get as Get
participant Sub as Sub
Sub ->> Pub: subscription
Pub ->> Cache: msg
Pub ->> Sub: msg
box Gray Pub Node
participant Pub
participant Cache
end
box Gray Sub Node
participant Get
participant Sub
end
sequenceDiagram
participant Pub as Pub
participant Cache as Cache
participant Get as Get
participant Sub as Sub
Pub ->> Cache: msg
Pub --x Sub: msg
Sub ->> Pub: subscription
Get ->> Cache: query
Cache ->> Get: msg
box Gray Pub Node
participant Pub
participant Cache
end
box Gray Sub Node
participant Get
participant Sub
end
sequenceDiagram
participant PL as Liveliness
participant Pub
participant Cache
participant Get
participant Sub
participant SL as Liveliness
Get ->> Cache: query
Cache ->> Get: nothing
PL ->> SL: liveliness
SL ->> Get: trigger
Get ->> Cache: query
Cache ->> Get: nothing
Pub ->> Cache: msg
Pub --x Sub: msg
Sub ->> Pub: subscription
box Gray Pub Node
participant Pub
participant Cache
participant PL
end
box Gray Sub Node
participant Get
participant Sub
participant SL
end
My questions are:
|
@nirwester could you checkout this branch #269, do a clean build ( The changes in the PR seem to have solved the issue at my end. |
This scenario cannot happen since the QueryingSubscriber declares its subscription before sending any query, and the PublicationCache declares its queryable before declaring its LivelinessToken.
|
I can not reproduce it any more with the fix, thank you! :) |
Thanks for testing! Glad to know the issue is resolved. |
Version
commit b78d6a1
Date: Wed Aug 7 13:51:57 2024 +0200
Platform
Ubuntu 22, Docker
ROS Version
Iron
Description
Hi, I tried to run our codebase tests with zenoh-rmw, and noticed that many of those relying on latched (durability tranisent_local) topics experience seldom failures: the subscribers sometimes fail to get the data from "recently spawned" publishers.
I created a small package where I manage to reproduce the issue (happens roughly 50% of the times on my setup, never reproducible if switching to Cyclone DDS):
https://github.com/nirwester/zenoh_pub_bug
The instructions to execute it are in the README.
Let me know if I can provide additional information.
The text was updated successfully, but these errors were encountered: