Skip to content

Commit

Permalink
Update the design document.
Browse files Browse the repository at this point in the history
Signed-off-by: Chris Lalancette <[email protected]>
  • Loading branch information
clalancette committed Feb 9, 2024
1 parent e268568 commit 7eb196d
Showing 1 changed file with 336 additions and 1 deletion.
337 changes: 336 additions & 1 deletion docs/design.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,19 @@
# Design

## Network Topology
## Introduction

`rmw_zenoh_cpp` maps the ROS 2 [RMW API](https://github.com/ros2/rmw/tree/rolling/rmw/include/rmw) as of late 2023 onto Zenoh APIs, using [zenoh-c](https://github.com/eclipse-zenoh/zenoh-c).
The end result is that users can use ROS 2 to send and receive data over Zenoh, using the APIs that they are already familiar with.

## Brief overview

There is more detail on each item below, but a brief overview on how this is accomplished is the following:

* It is assumed that Zenoh a router is running on the local system. This router will be used for discovery and host-to-host communication. However it is *not* used for intra-host comms; that is done via direct peer-to-peer connections.
* Each "context" in ROS 2 is mapped to a single Zenoh "session". That means that there may be many publishers, subscriptions, services, and clients sharing the same session.
* Every "context" has a local "graph cache" that keeps track of the details of the network graph of ROS 2 entities.
* Zenoh publishers, subscriptions, services, and clients are created or destroyed when the corresponding RMW APIs are called.
* Data is sent and received through the appropriate zenoh-c API when the corresponding RMW APIs are called.

The following diagram shows the default network topology of a subsystem composed of 3 nodes:

Expand Down Expand Up @@ -50,6 +63,328 @@ Default Configuration for Zenoh Sessions:
| UDP Multicast | Disabled | Disabled |
| Gossip Scouting | Enabled | Enabled |

## Router

Zenoh has the ability to do discovery using local multicast announcements.
However, local multicast has some limitations, both intrinsic and specific to Zenoh:

* Multicast discovery can cause a lot of discovery traffic while discovering all other entities in the graph.
* Multicast discovery has a limited TTL (time-to-live), which means it can usually only discover peers on the local network segment.

For the reasons cited above, `rmw_zenoh_cpp` requires a Zenoh router to be running.

It should be noted that when building upstream Zenoh from source, a `zenohd` binary is created which is the router.
`rmw_zenoh_cpp` actually has its own simplified version of the router that nonetheless uses most of the same code.
This was done so that Zenoh didn't have to be vendored twice (once for zenoh-c and once for zenohd), and so that the router could be more easily integrated into the ROS 2 package format.

As of 2024-02-09, the user is expected to launch the router by hand.
In the future, additional work may be done to automatically launch a Zenoh router if one isn't already running.

## Serialization/Deserialization

By default, Zenoh doesn't specify a serialization format; it just delivers bags of bytes.
There are quite a few serialization formats that `rmw_zenoh_cpp` could use, including protobuf, CDR, Cap'n Proto, JSON, etc.
In the current design, `rmw_zenoh_cpp` uses CDR as the serialization format for the following reasons:

* Using CDR means that data can be bridged between Zenoh and DDS without a deserialization/serialization step.
* Using CDR means that a new typesupport doesn't have to be developed, which is its own project.

### Related RMW APIs

* rmw_get_serialized_message_size
* rmw_serialize
* rmw_deserialize

### Related Zenoh-c APIs

N/A

## Graph Cache

One of the biggest impedance mismatches between Zenoh and ROS 2 has to do with graph introspection.
Zenoh attempts to do the absolute minimum of discovery (for performance reasons), while ROS 2 generally assumes that the entire graph is available from all entities in the system (for debugging/introspection reasons).
To deal with this discrepancy, each context in `rmw_zenoh_cpp` keeps a cache of all entities discovered in the graph so far.
An "entity" is a node, publisher, subscription, service server, or service client.
Each entity sends a unique liveliness token as it comes online, and removes that liveliness token when it is destroyed.
These tokens contain information about the entity and it's relationship to the other entities in the system (for instance, a publisher is always attached to a node).

### Related RMW APIs

* rmw_publisher_count_matched_subscriptions
* rmw_subscription_count_matched_publishers
* rmw_get_node_names
* rmw_get_node_names_with_enclaves
* rmw_count_publishers
* rmw_count_subscribers
* rmw_count_clients
* rmw_count_services
* rmw_get_gid_for_publisher
* rmw_get_gid_for_client
* rmw_compare_gids_equal
* rmw_get_service_names_and_types
* rmw_get_publishers_info_by_topic
* rmw_get_subscriptions_info_by_topic
* rmw_get_subscriber_names_and_types_by_node
* rmw_get_publisher_names_and_types_by_node
* rmw_get_service_names_and_types_by_node
* rmw_get_client_names_and_types_by_node
* rmw_get_topic_names_and_types

### Related Zenoh-c APIs

* zc_liveliness_declare_token
* zc_liveliness_declare_subscriber
* zc_liveliness_get

## Contexts

A ROS 2 context describes a certain middleware configuration, which can contain 0 or more ROS 2 nodes.
In `rmw_zenoh_cpp`, a context maps to a Zenoh session, along with a liveliness token for the graph cache and some additional metadata.

### Related RMW APIs

* rmw_get_zero_initialized_init_options
* rmw_init_options_copy
* rmw_init_options_fini
* rmw_get_zero_initialized_context
* rmw_init
* rmw_shutdown
* rmw_context_init
* rmw_create_guard_condition
* rmw_destroy_guard_condition
* rmw_trigger_guard_condition
* rmw_create_wait_set
* rmw_destroy_wait_set
* rmw_wait

### Related Zenoh-c APIs

* zc_liveliness_declare_subscriber
* zc_liveliness_get
* z_open
* z_close
* z_undeclare_subscriber
* z_call
* z_session_check

## Nodes

A ROS 2 node can be though of as the "unit of computation" in a ROS 2 graph; usually one node performs one particular task.
Nodes may contain publishers, subscriptions, service servers, service clients, action servers, action clients, parameters, and anything else needed to do some computation.
Zenoh has no conceptual equivalent to the ROS 2 node, so `rmw_zenoh_cpp` creates no Zenoh entities when nodes are created.
When a new node is created through the RMW API, a liveliness token of type `NN` is sent out.

### Related RMW APIs

* rmw_create_node
* rmw_destroy_node
* rmw_node_get_graph_guard_condition

### Related Zenoh-c APIs

* zc_liveliness_declare_token

## Publishers

A ROS 2 publisher sends data to 0 or more connected subscriptions.
A Zenoh publisher does exactly the same thing, so ROS 2 publishers are mapped onto Zenoh publishers in `rmw_zenoh_cpp`.
When a new publisher is created, a liveliness token of type `MP` is sent out.

### Related RMW APIs

* rmw_create_publisher
* rmw_destroy_publisher
* rmw_publish
* rmw_publish_serialized_message
* rmw_borrow_loaned_message
* rmw_return_loaned_message
* rmw_publisher_wait_for_all_acked
* rmw_publisher_get_network_flow_endpoints
* rmw_publisher_event_init

### Related Zenoh-c APIs

* zc_liveliness_declare_token
* zc_publish_put_owned
* z_declare_publisher
* z_undeclare_publisher
* z_publisher_put

## Subscriptions

A ROS 2 subscription receives data from 1 or more connected publishers.
A Zenoh subscriber does exactly the same thing, so ROS 2 subscriptions are mapped onto Zenoh subscribers in `rmw_zenoh_cpp`.
When new data arrives, a callback within `rmw_zenoh_cpp` is executed, which takes ownership of the data and signals that there is data available.
Then rmw_wait can find out that there is data available, and the data can be delivered via rmw_take.
When a new subscription is created, a liveliness token of type `MS` is sent out.

### Related RMW APIs

* rmw_create_subscription
* rmw_destroy_subscription
* rmw_take
* rmw_take_with_info
* rmw_take_sequence
* rmw_take_serialized_message
* rmw_take_serialized_message_with_info
* rmw_wait
* rmw_subscription_set_on_new_request_callback
* rmw_subscription_set_content_filter
* rmw_subscription_get_content_filter
* rmw_take_loaned_message
* rmw_take_loaned_message_with_info
* rmw_return_loaned_message_from_subscription
* rmw_subscription_get_network_flow_endpoints
* rmw_subscription_event_init

### Related Zenoh-c APIs

* zc_liveliness_declare_token
* zc_sample_payload_rcinc
* z_declare_subscriber
* z_undeclare_subscriber

## Service Clients

In ROS 2, services are meant to be used for remote procedure calls that will return fairly quickly.
`rmw_zenoh_cpp` uses Zenoh queryables to implement ROS 2 services.
When a client wants to make a request, it uses the rmw API `rmw_send_request`.
`rmw_zenoh_cpp` then calls the Zenoh `z_get` function to send a query out to the network.
Assuming there is a service server listening to that queryable, it will receive the request, perform a computation, and return the result.
The result will then be made available to the client via `rmw_take_response`.
When a new service client is created, a liveliness token of type `SC` is sent out.

### Related RMW APIs

* rmw_create_client
* rmw_destroy_client
* rmw_send_request
* rwm_take_response
* rmw_take
* rmw_take_with_info
* rmw_take_sequence
* rmw_take_serialized_message
* rmw_take_serialized_message_with_info
* rmw_wait
* rmw_service_server_is_available
* rmw_client_set_on_new_response_callback

### Related Zenoh-c APIs

* zc_liveliness_declare_token
* z_get
* z_attachment_get

## Service Servers

In ROS 2, services are meant to be used for remote procedure calls that will return fairly quickly.
`rmw_zenoh_cpp` uses Zenoh queryables to implement ROS 2 services.
When a ROS 2 node wants to advertise a service to the network, it calls `rmw_create_service`.
`rmw_zenoh_cpp` uses the `z_declare_queryable` Zenoh API to create that service.
When a client request comes in, `rmw_take_request` is called to send the query to the user callback, which should perform some computation.
Once the user callback returns, `rmw_send_response` is called to send the response back to the requester.
When a new service server is created, a liveliness token of type `SS` is sent out.

### Related RMW APIs

* rmw_create_service
* rmw_destroy_service
* rmw_take_request
* rmw_send_response
* rmw_take
* rmw_take_with_info
* rmw_take_sequence
* rmw_take_serialized_message
* rmw_take_serialized_message_with_info
* rmw_wait

### Related Zenoh-c APIs

* zc_liveliness_declare_token
* z_attachment_get
* z_declare_queryable
* z_undeclare_queryable
* z_query_value
* z_query_attachment

## Quality of Service (QoS)

The ROS 2 RMW layer defines quite a few Quality of Service settings that are largely derived from DDS.
Here is an incomplete list of some of the settings and the values that they can take:

* RELIABILITY
* RELIABLE - Data delivery is retried until it is successfully delivered. Because Zenoh is TCP-based (by default), this is the default in `rmw_zenoh_cpp`.
* BEST_EFFORT - Data may be dropped during delivery. Because Zenoh is TCP-based (by default), this may not work exactly the same as in DDS.
* HISTORY
* KEEP_LAST - For subscriptions, only keep up to a maximum number of samples (defined by depth); once the maximum is reached, older samples will be lost.
* KEEP_ALL - For subscriptions, keep all values.
* DEPTH - The maximum number of samples to keep; only comes into play when KEEP_LAST history is used.
* DURABILITY
* VOLATILE - Samples will only be delivered to subscriptions that are active at the time of publishing. In `rmw_zenoh_cpp`, this is implemented via `z_declare_subscriber` on the subscription side and `z_declare_publisher` on the publisher side.
* TRANSIENT_LOCAL - "Late-joining" subscriptions will receive historical data, along with any new data. In `rmw_zenoh_cpp`, this is implemented via `ze_declare_querying_subscriber` on the subscription side and `ze_declare_publication_cache` on the publisher side.
* LIVELINESS
* AUTOMATIC - The "liveliness" of an entity of the system is managed by the RMW layer. This is what `rmw_zenoh_cpp` uses.
* MANUAL_BY_TOPIC - It is up to the application to periodically publish to a particular topic to assert liveliness.
* DEADLINE - The period at which messages are expected to be sent/received. Currently unimplemented in `rmw_zenoh_cpp`.
* LIFESPAN - The age at which messages are expired and no longer valid. Currently unimplemented in `rmw_zenoh_cpp`.

### Related RMW APIs

* rmw_publisher_get_actual_qos
* rmw_subscription_get_actual_qos
* rmw_client_request_publisher_get_actual_qos
* rmw_client_response_subscription_get_actual_qos
* rmw_service_request_subscription_get_actual_qos
* rmw_service_response_publisher_get_actual_qos

### Related Zenoh-c APIs

N/A

## Events

In ROS 2 terminology, "events" are out-of-band things that may happen for communication.
For instance, if a message is lost, then the RMW layer may raise an event to the upper layers to signal that fact.

Events are broken down into subscription events and publisher events:

* Subscription
* LIVELINESS_CHANGED
* DEADLINE_MISSED
* QOS_INCOMPATIBLE
* MESSAGE_LOST
* INCOMPATIBLE_TYPE
* MATCHED
* Publisher
* LIVELINESS_LOST
* DEADLINE_MISSED
* QOS_INCOMPATIBLE
* INCOMPATIBLE_TYPE
* MATCHED

### Related RMW APIs

* rmw_wait
* rmw_take
* rmw_event_set_callback
* rmw_publisher_event_init
* rmw_subscription_event_init
* rmw_take_event

### Related Zenoh-c APIs

N/A

## Actions

As of 2024-02-09, there is no concept of an action at the RMW level in ROS 2.
Instead, actions are composed of several services and pub/sub.
Thus, there is no direct implementation of actions in `rmw_zenoh_cpp`.

## Security

TBD

### Notes

- Zenoh routers (a.k.a. `zenohd`) are required mainly to allow the peers to discover each other within the subsystem.
Expand Down

0 comments on commit 7eb196d

Please sign in to comment.