RMW API: Differences from upstream rmw.h
This page is for developers who already know upstream ROS 2’s
rmw/rmw.h and want to write — or just
understand — a backend for nano-ros. Side-by-side: what rmw.h looks
like, what nros-rmw-cffi looks like, and why nano-ros diverges.
The C signatures shown for nano-ros come from
<nros/rmw_vtable.h>. The Rust trait
counterparts are an alternative entry point for porters who already
work in Rust; this page sticks to the C-vtable surface throughout.
TL;DR
| Concern | upstream rmw.h | nros-rmw-cffi |
|---|---|---|
| Plugin loading | dlopen("librmw_*.so") at runtime | Single vtable registered at init |
| Init sequence | rmw_init_options_t → rmw_context_t → entities | One open() call, returns the session |
| Entity types | rmw_publisher_t / rmw_subscription_t / rmw_service_t / rmw_client_t | Typed-with-opaque-tail: nros_rmw_publisher_t / _subscriber_t / _service_server_t / _service_client_t (visible metadata + opaque backend_data) |
| Wait | rmw_wait(waitset, timeout) blocks the caller | drive_io(session, timeout_ms) drives I/O once |
| Serialization | typesupport-driven (rosidl) | Pre-serialized CDR bytes only |
| Graph queries | rmw_get_topic_names_and_types, … | None |
| QoS profiles | Full DDS profile match between endpoints | Same field set; per-backend support advertised; synchronous IncompatibleQos on create instead of runtime mismatch event |
| DDS events | rmw_event_t (rmw_take_event) | None |
| Loaned messages | Optional rmw_borrow_loaned_message | First-class loan_publish / loan_recv |
| Error returns | rmw_ret_t (RMW_RET_OK, …) | nros_rmw_ret_t (NROS_RMW_RET_OK, …) — same named-constant style |
1. Plugin loading vs. compile-time backend
Upstream. rmw_implementation resolves the backend at process
start by opening a shared library:
RCUTILS_DLLIMPORT
const char * rmw_get_implementation_identifier(void);
The runtime calls dlopen against the path encoded by
RMW_IMPLEMENTATION and binds every rmw_* symbol through the loader.
nano-ros. The backend is linked into the binary. A C backend registers its vtable once before any nros call:
#include <nros/rmw_vtable.h>
static const nros_rmw_vtable_t MY_VTABLE = { ... };
int main(void) {
nros_rmw_cffi_register(&MY_VTABLE);
nros_init(...);
}
Why. Most embedded targets have no dynamic loader. Even where
dlopen exists, a 32 KB Flash budget can’t afford to link every
backend’s C client and pick at runtime. Compile-time selection cuts
the binary by 60–80 % and lets the linker drop unused entity paths.
2. Init sequence
Upstream. Three-step init:
rmw_init_options_t options = rmw_get_zero_initialized_init_options();
rmw_init_options_init(&options, allocator);
rmw_context_t context = rmw_get_zero_initialized_context();
rmw_init(&options, &context);
/* ... use context to create rmw_node_t, then rmw_publisher_t, … */
rmw_shutdown(&context);
nano-ros. One step — a session covers what upstream splits across options + context:
nros_rmw_session_t session = {0};
nros_rmw_ret_t ret = vtable->open(locator, mode, domain_id, node_name, &session);
/* ... use &session to create_publisher / create_subscriber / … */
vtable->close(&session);
Why. Upstream’s split is useful when an application owns multiple RMW instances with different options. Nano-ros assumes one session per process — one transport, one wire-protocol, one set of QoS defaults — so the options/context separation buys nothing.
3. Entity handles
Upstream. Every entity is a typed C struct with backend-private
state hidden behind void * data:
typedef struct RMW_PUBLIC_TYPE rmw_publisher_t {
const char * implementation_identifier;
void * data;
const char * topic_name;
rmw_publisher_options_t options;
bool can_loan_messages;
} rmw_publisher_t;
nano-ros. Hybrid: typed-with-opaque-tail.
Each entity is a typed C struct exposing the metadata the runtime
actually reads (topic name, type name, QoS, lending capabilities)
inline; backend-private state stays behind an opaque backend_data
pointer.
typedef struct nros_rmw_publisher_t {
const char * topic_name; /* borrowed; outlives the publisher */
const char * type_name; /* borrowed */
nros_rmw_qos_t qos;
bool can_loan_messages; /* matches upstream's field of the same name */
uint8_t _reserved[7]; /* forward-compat; must be zero */
void * backend_data; /* opaque */
} nros_rmw_publisher_t;
nros_rmw_ret_t (*create_publisher)(
nros_rmw_session_t * session,
const char * topic_name, const char * type_name, const char * type_hash,
uint32_t domain_id, const nros_rmw_qos_t * qos,
nros_rmw_publisher_t * out); /* runtime-allocated; backend fills */
Same shape for nros_rmw_subscriber_t. Service entities
(nros_rmw_service_server_t, nros_rmw_service_client_t) and
nros_rmw_session_t have no qos and no can_loan_messages —
service-level QoS doesn’t generalise across non-DDS backends
(see QoS, Section 7)
and service request/reply uses the byte-buffer API rather than the
loan primitive.
Forward-compat reserved bytes. Each entity carries an explicit
_reserved[N] byte block (sized to fill the natural alignment slot
before backend_data). New fields up to N bytes can be added later
without changing struct size or the offset of any field after
backend_data. Backends and runtime keep these bytes zero.
Storage ownership. The runtime allocates the entity-struct shell;
the backend writes its backend_data (plus can_loan_messages for
publisher / subscriber entities) into the runtime-supplied
out-parameter at create_* time. The backend never mallocs a
struct shell — embedded targets cannot afford a per-entity heap
allocation. destroy_* releases only the backend’s backend_data;
the shell stays valid until the runtime drops its owner.
Differences from upstream’s rmw_publisher_t.
- Borrowed strings, not backend-owned copies. Upstream’s
topic_namepoints to a backend-allocated string copied atcreate_publishertime. Ours points to caller (runtime) storage that outlives the publisher — no allocation per entity. - No
implementation_identifierfield. Backend selection is compile-time (see Section 1); there’s no plugin loader to dispatch through, so no need to identify which backend owns a struct. can_loan_messagesmatches upstream. Same bool, same name, same semantics —trueif the backend exposes the loan primitive (the CDR-byte zero-copy loan path). The runtime reads it once at create time and dispatches the publish path with no per-call branch.depth: uint16_t. Upstream uses 32-bit; embedded queue depths are 1–100, the 16-bit width saves 2 bytes × N entities.- Explicit
_reserved[N]bytes. Upstream uses an embeddedrmw_publisher_options_tstruct as the extension point; we reserve raw bytes inline. Same forward-compat property — new fields up to N bytes don’t break ABI — without the indirection.
Why this shape. Fully-opaque void * (the previous nano-ros
design) forced every introspection through a vtable callback.
Upstream’s “expose every field, backend keeps them in sync” forces
duplicated state. The typed-with-opaque-tail middle ground exposes
exactly the fields the runtime reads — no callback indirection — and
keeps backend implementation state private. The struct layout is
ABI; adding or reordering fields is a major-version bump.
4. drive_io vs. rmw_wait
Upstream. Clients block on a waitset that aggregates entities:
rmw_ret_t rmw_wait(
rmw_subscriptions_t * subscriptions,
rmw_guard_conditions_t * guard_conditions,
rmw_services_t * services,
rmw_clients_t * clients,
rmw_events_t * events,
rmw_wait_set_t * wait_set,
const rmw_time_t * wait_timeout);
The middleware can also spawn its own background threads that fire callbacks asynchronously.
nano-ros. The executor calls a single drive-I/O entry point:
nros_rmw_ret_t (*drive_io)(nros_rmw_session_t * session, int32_t timeout_ms);
The backend dispatches whatever receive / send / wakeup work is
pending and returns within timeout_ms. There is no waitset, no
guard-condition aggregation, no implicit middleware thread.
How the two models differ in practice
The two designs distribute work across different layers:
| Phase | Upstream rmw_wait | nano-ros drive_io |
|---|---|---|
| Build the wait set | Executor rebuilds a waitset every spin_once, adds every entity | Executor registers entities once at construction; no per-spin rebuild |
| Block | rmw_wait blocks the thread on a kernel waitable (DDS WaitSet, condvar, kqueue) until any entity is ready or timeout | drive_io blocks (sleep-model backends) or polls (poll-model backends) for up to timeout_ms |
| Signal readiness | Wait primitive raises per-entity ready flags; backend writes flags into the waitset’s status arrays | Backend’s RX worker pulls bytes; the data is “ready” by virtue of being received |
| Dispatch user callback | Executor’s spin_once picks one ready entity, calls rmw_take, fires its user callback | Backend’s RX worker / drive_io loop fires user callbacks while it has work to do |
Per spin_once | Exactly one user callback runs (single-threaded executor) | All ready callbacks run before drive_io returns |
The upstream model separates wait (kernel-blockable) from dispatch (executor-controlled). nano-ros today fuses them — the backend handles both inside one call.
What this buys, what this costs
Fusing wins on:
- No per-spin waitset rebuild. Upstream’s
add_handles_to_wait_setallocates and walks every entity each iteration. On a 100-entity executor at 1 kHz that’s 100 000 per-second add/remove operations plus heap churn. nano-ros registers entities once; the backend tracks them statically. - No kernel-waitable per entity. Upstream’s per-entity ready flags need a per-entity wakeable resource (DDS Condition, eventfd, pipe). nano-ros’s backend uses one wait primitive per session; per-entity tracking is in user-space backend state.
- Backend RX worker fires callbacks directly. zenoh-pico’s
_z_session_read_taskinvokes user callbacks during its read. nano-ros’sdrive_iodrains it; upstream’srmw_zenohwould still have to round-trip throughrmw_take.
Fusing costs on scheduling control. Upstream’s “one callback per
spin_once” rule gives the executor an opportunity between every
two callbacks to:
- Re-check timer expirations
- Re-check guard conditions
- Yield to higher-priority work (multi-threaded executor)
- Apply per-callback priority ordering
nano-ros’s drive_io runs all ready callbacks back-to-back, then
the spin loop processes timers + GCs between drive_io calls.
For a 100 ms drive_io call that fires 10 sub callbacks in 80 ms,
a timer that should have fired 5 ms in is delayed 75 ms.
Where this fits each RTOS execution model
| Execution model | Fits drive_io today? |
|---|---|
| Cooperative single-task (one task does ROS, no priority competition) | Yes — no other task to preempt; entity scheduling fairness is moot |
| Async / tokio / Embassy (futures, wakers) | Yes — spin_async drives futures; drive_io not used in the hot path |
| Preemptive priority RTOS, ROS at one priority (FreeRTOS / ThreadX / Zephyr typical) | Partial — kernel preemption from higher-priority tasks works; ROS-internal entity scheduling is batch-FIFO, timer expiries can be delayed by long sub callbacks at the same priority |
| WCET-bounded real-time (RTIC, DO-178C) | No — drive_io has unbounded execution time; callers needing per-callback WCET use the async path with explicit Waker integration instead |
| Time-triggered cyclic | No — no way to bound drive_io to a fixed wall-clock budget |
The “Yes” rows are where nano-ros ships today. The “Partial” / “No” rows are addressed by the work below.
How RTOS cooperation will improve
Three forward-looking knobs land incrementally as the
drive_io interface is extended. None breaks the default behaviour;
each is opt-in for apps that need it.
-
Backend-internal-deadline visibility.
Session::next_deadline_ms()tells the executor when the backend’s next internal event (lease keepalive, heartbeat, ACK retransmit) is due. The executor capsdrive_io’s timeout against it so the call doesn’t return sooner than expected on otherwise-quiet links. Saves one round-trip per quiet period. -
Per-call user-callback cap.
drive_ioacceptsmax_callbacks: an upper bound on user callbacks fired per call. Setting it to1reproduces upstream’s “one callback perspin_once” pattern. The runtime spin loop callsdrive_ioagain to drain pending work, with timer / GC checks between iterations. Closes the priority-inversion footgun for preemptive priority RTOS. -
Wall-clock budget per call.
drive_ioacceptstime_budget_ms: a wall-clock cap that bounds total time spent firing callbacks. Time-triggered cyclic apps configure a fixed slot per cycle;drive_ioyields when the slot expires even ifmax_callbacksisn’t reached.
Once the cap (knob 2) ships, an additional refinement moves timer
and guard-condition dispatch into the backend’s drive_io loop so
the cap applies uniformly across all callback sources, not just
backend-RX-driven ones. This unifies the dispatch path and makes
max_callbacks = 1 mean “exactly one callback per spin_once,
regardless of whether it’s a sub, service, timer, or guard
condition.”
For the per-RTOS-model recommendations (which knobs to set, which defaults to use), see the RTOS Cooperation concepts page.
Why drive_io and not rmw_wait. Cooperative single-task runtimes
(bare-metal, single-threaded RTOS) have one execution context. A
waitset abstraction with kernel-blockable per-entity resources
doesn’t fit — there is no kernel to provide them. drive_io makes
the cooperative model explicit and lets backends pick the most
efficient wait primitive available on their target (kernel block on
multi-threaded; cooperative yield + WFI on bare-metal). The
scheduling-control trade-off the upstream waitset gives up is
recovered through the optional knobs above for apps that need it.
5. Serialization: CDR bytes, not typesupport
Upstream. Publish/take APIs accept typed messages plus typesupport pointers, and each backend implements serialization-from-typesupport itself:
rmw_ret_t rmw_publish(
const rmw_publisher_t * publisher,
const void * ros_message,
rmw_publisher_allocation_t * allocation);
The backend dereferences ros_message according to a
rosidl_message_type_support_t table to compute the wire bytes.
nano-ros. The runtime serializes upstream of the RMW. Backends receive already-CDR-encoded bytes:
nros_rmw_ret_t (*publish_raw)(nros_rmw_publisher_t * publisher,
const uint8_t * data, size_t len);
int32_t (*try_recv_raw)(nros_rmw_subscriber_t * subscriber,
uint8_t * buf, size_t len);
Why. rosidl typesupport is heavy: dynamic dispatch through a function-pointer table per field type, dependence on dynamic allocators for nested sequences, megabytes of generated symbol tables linked even for one message. Splitting the layers means:
- Codegen (
nros_generate_interfaces()from CMake) writes a fixed-shape C struct per message and a single<MsgType>_serialize_cdr(...)function. - The runtime calls it before handing bytes to the RMW.
- The RMW backend stays transport-only — no rosidl, no typesupport, no allocator coupling.
6. No graph cache
Upstream. Graph introspection sits in the RMW:
rmw_ret_t rmw_get_topic_names_and_types(...);
rmw_ret_t rmw_count_publishers(...);
rmw_ret_t rmw_get_node_names(...);
Implementations maintain a discovery cache that costs heap and CPU continuously even when nothing reads it.
nano-ros. None of the above. Backends do whatever discovery their
transport mandates (zenoh liveliness, XRCE-DDS session establishment),
but no get_topic_names / count_publishers / node_names exists in
the vtable.
Why. Graph introspection is fundamentally a host-side debugging
need. An MCU has no terminal, no ros2 topic list. Wire-protocol
interop with rmw_zenoh_cpp means standard ROS 2 tools running on a
laptop can introspect the same domain as the MCU — at zero cost on
the MCU.
7. QoS: full DDS-shaped profile, per-backend support advertised
Upstream. Full DDS QoS profile family with profile matching between endpoints:
typedef struct RMW_PUBLIC_TYPE rmw_qos_profile_t {
rmw_qos_history_policy_t history;
size_t depth;
rmw_qos_reliability_policy_t reliability;
rmw_qos_durability_policy_t durability;
rmw_time_t deadline;
rmw_time_t lifespan;
rmw_qos_liveliness_policy_t liveliness;
rmw_time_t liveliness_lease_duration;
bool avoid_ros_namespace_conventions;
} rmw_qos_profile_t;
The backend negotiates compatibility
(rmw_qos_profile_check_compatible) and surfaces mismatches as
runtime events.
nano-ros. Same field set, packed into 24 bytes:
typedef struct nros_rmw_qos_t {
uint8_t reliability;
uint8_t durability;
uint8_t history;
uint8_t liveliness_kind;
uint16_t depth;
uint16_t _reserved0;
uint32_t deadline_ms; /* 0 = infinite */
uint32_t lifespan_ms; /* 0 = infinite */
uint32_t liveliness_lease_ms; /* 0 = infinite */
bool avoid_ros_namespace_conventions;
uint8_t _reserved1[3];
} nros_rmw_qos_t;
Standard profile constants
(NROS_RMW_QOS_PROFILE_DEFAULT, _SENSOR_DATA,
_SERVICES_DEFAULT, _SYSTEM_DEFAULT, _PARAMETERS) match
upstream rmw_qos_profile_* field-for-field, so applications
porting from rclcpp / rclrs can pull the equivalent profile
constant unchanged.
Per-backend support, no silent downgrade
Each backend advertises which policies it can honour via
Session::supported_qos_policies(), returning a QosPolicyMask
bitfield. Policies a backend can’t enforce are explicit.
#![allow(unused)]
fn main() {
pub trait Session {
fn supported_qos_policies(&self) -> QosPolicyMask {
QosPolicyMask::CORE // reliability + durability VOLATILE + history + depth
}
}
}
The runtime validates the requested QoS against the backend’s mask
at entity-create time. Requesting a policy the backend doesn’t
support returns TransportError::IncompatibleQos
(NROS_RMW_RET_INCOMPATIBLE_QOS at the C boundary) synchronously.
There is no silent degradation — applications either get the
requested QoS or a hard error.
Apps that need cross-backend portability check the mask at startup:
#![allow(unused)]
fn main() {
if session.supported_qos_policies()
.contains(QosPolicyMask::DEADLINE)
{
pub.create_with_qos(...deadline_ms = 100, ...);
} else {
// app-side fallback: timeout monitoring in user code
}
}
Manual liveliness assertion
For LIVELINESS_MANUAL_BY_TOPIC and LIVELINESS_MANUAL_BY_NODE,
publishers call assert_liveliness() explicitly:
#![allow(unused)]
fn main() {
pub.assert_liveliness()?; // refresh this publisher's lease
}
C side: nros_publisher_assert_liveliness(&pub). C++ side:
pub.assert_liveliness(). No-op for AUTOMATIC and NONE kinds.
Differences from upstream’s matching
Upstream surfaces QoS mismatches via runtime events
(RMW_EVENT_REQUESTED_INCOMPATIBLE_QOS). nano-ros surfaces them
synchronously at create time as IncompatibleQos. The mismatch
is a configuration error visible at startup; the runtime path
doesn’t need to handle it.
Two related choices:
- No profile matching between publisher and subscriber. Each endpoint requests the QoS it wants from its backend; the backend enforces locally. Cross-endpoint compatibility is the wire protocol’s concern — DDS endpoints negotiate via DDS Discovery, zenoh endpoints communicate intent via the topic-key encoding, uORB endpoints share an in-process queue. nano-ros’s executor doesn’t run a profile-matching pass.
- Wire metadata per backend. Lifespan needs per-sample timestamps; liveliness needs a keepalive mechanism. Each backend uses its native attachment / sample-info mechanism (Zenoh attachments, DDS RTPS sample-info, XRCE session pings) — no cross-backend metadata header.
Per-backend QoS coverage
The mask actually advertised by each backend’s
Session::supported_qos_policies():
| Backend | Reliability + Durability + History/Depth | Deadline | Lifespan | Liveliness Automatic | Liveliness Manual | Liveliness Lease | avoid_ros_namespace_conventions |
|---|---|---|---|---|---|---|---|
| Cyclone DDS | ✅ Native | ✅ Native | ✅ Native | ✅ Native | ✅ Native | ✅ Native | ✅ honoured |
| XRCE-DDS | ✅ Native (binary uxrQoS_t) | ✅ Shim-side clock check (sub: RequestedDeadlineMissed; pub: OfferedDeadlineMissed) + agent-side via FastDDS XML profile | ✅ Agent-side via FastDDS XML profile | ✅ Native | ✅ Configured via XML | ✅ Configured via XML | ✅ honoured |
| zenoh-pico | ✅ Shim-emulated | ✅ Clock-based check (sub + pub) | ✅ Subscriber-side filter using attachment timestamp | ✅ Trivial via session keepalive | ✅ Shim-side keepalive timer at pub publish path | ✅ Honoured | n/a (no /rt/ prefix) |
| uORB | ✅ CORE only (intra-process, no wire) | ❌ No rate concept | ❌ No expiry concept | ❌ No wire-level liveliness | ❌ | ❌ | n/a |
Key takeaways:
- The default QoS profile (RELIABLE + VOLATILE + KEEP_LAST(10) + AUTOMATIC) works on every backend.
- Apps that need extended QoS but want to stay backend-portable: check
supported_qos_policies()at startup and degrade gracefully. - For full DDS QoS: Cyclone DDS is the native DDS path; XRCE-DDS auto-routes through FastDDS XML profile when extended policies are set. Zenoh-pico fills the gap with shim-side emulation. uORB is intra-process only.
See Status events for how the deadline / liveliness / message-lost policies translate into runtime events, and User guide → Configuration for code examples.
8. Status events: callback-on-entity, Tier-1 subset
Upstream. Event APIs surface DDS-shaped notifications via a waitset-take pattern:
rmw_ret_t rmw_subscription_event_init(
rmw_event_t * event, const rmw_subscription_t * sub,
rmw_event_type_t type);
/* Add event_handle to a waitset alongside subscriptions. */
rmw_ret_t rmw_wait(...);
/* Poll fired events. */
rmw_ret_t rmw_take_event(
const rmw_event_t * event_handle,
void * event_info, bool * taken);
Eleven event types covering liveliness, deadline, QoS-incompatibility, match, message-lost, type-incompatibility — all dispatched through the waitset.
nano-ros. Callback-on-entity for a Tier-1 subset (liveliness
changes, deadline misses, message lost). Skips the waitset. Skips
Tier-2 (MATCHED) and Tier-3 (QOS_INCOMPATIBLE,
INCOMPATIBLE_TYPE) — see “What’s skipped” below.
#![allow(unused)]
fn main() {
sub.on_liveliness_changed(|status| {
if status.alive_count == 0 { trigger_failover(); }
})?;
sub.on_requested_deadline_missed(
Duration::from_millis(15),
|status| metric_inc(&LATE_SAMPLE_COUNT, status.total_count_change),
)?;
sub.on_message_lost(|status| log::warn!("dropped {}", status.total_count_change))?;
}
C side mirrors with nros_subscription_set_*_callback functions.
What lands
| Event | Producer | Subscriber callback / Publisher callback |
|---|---|---|
| Liveliness changed | sub | on_liveliness_changed(LivelinessChangedStatus) |
| Liveliness lost | pub | on_liveliness_lost(DeadlineMissedStatus) |
| Requested deadline missed | sub | on_requested_deadline_missed(deadline, DeadlineMissedStatus) |
| Offered deadline missed | pub | on_offered_deadline_missed(deadline, DeadlineMissedStatus) |
| Message lost | sub | on_message_lost(MessageLostStatus) |
What’s skipped
MATCHED— embedded apps usually have static topology; rarely load-bearing. Add the kind if a discovery-tracking app shows up; additive.QOS_INCOMPATIBLE/INCOMPATIBLE_TYPE— these surface at create time, not as runtime events. The existingnros_rmw_ret_tcodes (NROS_RMW_RET_INCOMPATIBLE_QOS) carry the diagnostic synchronously fromcreate_publisher/create_subscriber. No event needed.
Dispatch — callback-on-entity, not waitset-take
Events fire from inside the existing drive_io callback-dispatch
path. The backend’s RX worker detects an event in the same place it
detects messages; runs the registered callback; loops. No separate
waitset, no per-call take.
This reuses the message-callback dispatch model; events count
against the max_callbacks cap from Section 4 the same way message
callbacks do.
Why callback-on-entity instead of waitset-take. The waitset-take pattern requires a waitset abstraction nano-ros deliberately doesn’t have (see Section 4). Replacing it with per-entity callbacks reuses existing machinery, matches the message-callback ergonomics users already know, and keeps the bounded-storage property — each registered event-callback is a fixed-size struct embedded in the entity, no per-call allocation.
The trade-off: users can’t bulk-poll all events at once. For the Tier-1 events this isn’t load-bearing — events are rare, callbacks are cheap.
Backend coverage
Coverage is uneven and surfaces through Subscriber::supports_event
(Rust) / register_*_event returning NROS_RMW_RET_UNSUPPORTED
(C). Apps must handle “not supported” — not every backend will
generate every event:
| Backend | Liveliness | Deadline | Message lost |
|---|---|---|---|
| Cyclone DDS | 🟡 Not wired through nano-ros events yet | 🟡 Not wired yet | 🟡 Not wired yet |
| XRCE-DDS | ❌ XRCE protocol carries no session→client liveliness callback | 🟡 Sub: shim-side clock check on try_recv_raw; pub: shim-side check on publish_raw. LivelinessChanged / LivelinessLost not feasible. | ❌ topic_callback carries no per-sample sequence |
| zenoh-pico | ✅ Sub: per-publisher count via zpico_liveliness_get_count + wildcard keyexpr query; pub: shim-side keepalive timer fires LivelinessLost from publish_raw when MANUAL_BY_* lease expires. | ✅ Clock-based check at sub + pub, rate-limited to ≤ 1 fire per deadline period | ✅ Sequence-gap detection from RMW attachment |
| uORB | ❌ No wire-level liveliness | ❌ No rate concept | ✅ Native: RustSubscriptionCallback publish-counter delta on host mock + real PX4 |
assert_liveliness() (manual): not all backends expose manual
liveliness through nano-ros yet. Unsupported backends’ default is
Ok(()) (no-op) since they don’t honour MANUAL_BY_TOPIC /
MANUAL_BY_NODE liveliness kinds.
9. Loaned messages first-class
Upstream. Optional, often unimplemented:
rmw_ret_t rmw_borrow_loaned_message(
const rmw_publisher_t * publisher,
const rosidl_message_type_support_t * type_support,
void ** ros_message);
rmw_ret_t rmw_return_loaned_message_from_publisher(
const rmw_publisher_t * publisher,
void * loaned_message);
A backend that doesn’t implement these returns
RMW_RET_UNSUPPORTED and the client falls back to a copying path.
nano-ros. Lending is a separate vtable surface. When the backend
supports zero-copy publish, it implements loan_publish_* /
commit_publish_*; when it supports zero-copy receive, it implements
loan_recv_* / release_recv_*. The runtime checks the function
pointers for non-NULL once at session open and takes the lending
path for the lifetime of the session.
Why. Zero-copy isn’t optional on embedded — every avoidable copy is a copy of a CDR-encoded sensor frame from a 64 KB heap. Promoting the lending surface to first-class makes it visible at compile time (missing pointer = compile error against the lending build) and lets the runtime pre-arrange arena slots once instead of probing on every publish.
10. Error returns
Upstream. Single error type for everything:
typedef int32_t rmw_ret_t;
#define RMW_RET_OK 0
#define RMW_RET_ERROR 1
#define RMW_RET_TIMEOUT 2
#define RMW_RET_UNSUPPORTED 3
/* ... */
Pointer-returning calls indicate failure with NULL and set a
thread-local error string via rmw_set_error_string.
nano-ros. Same named-constant style (<nros/rmw_ret.h>),
different sign convention, no thread-local error string:
typedef int32_t nros_rmw_ret_t;
#define NROS_RMW_RET_OK 0
#define NROS_RMW_RET_ERROR -1
#define NROS_RMW_RET_TIMEOUT -2
#define NROS_RMW_RET_BAD_ALLOC -3
#define NROS_RMW_RET_INVALID_ARGUMENT -4
#define NROS_RMW_RET_UNSUPPORTED -5
#define NROS_RMW_RET_INCOMPATIBLE_QOS -6
#define NROS_RMW_RET_TOPIC_NAME_INVALID -7
#define NROS_RMW_RET_NODE_NAME_NON_EXISTENT -8
#define NROS_RMW_RET_LOAN_NOT_SUPPORTED -9
#define NROS_RMW_RET_NO_DATA -10
#define NROS_RMW_RET_WOULD_BLOCK -11
#define NROS_RMW_RET_BUFFER_TOO_SMALL -12
#define NROS_RMW_RET_MESSAGE_TOO_LARGE -13
Two return-shape conventions, picked by call shape:
| Returns | Success | Failure |
|---|---|---|
nros_rmw_ret_t + entity-struct out-param (open, create_publisher, create_subscriber, …) | NROS_RMW_RET_OK, out->backend_data non-NULL | negative named constant |
nros_rmw_ret_t (close, drive_io, publish_raw, send_reply, …) | NROS_RMW_RET_OK | negative named constant |
int32_t byte count (try_recv_raw, try_recv_request, call_raw) | >= 0 (bytes received) | negative nros_rmw_ret_t |
Differences from upstream.
- Negative for error. Upstream uses positive integer codes
(
RMW_RET_ERROR = 1); nano-ros uses negative so the byte-count convention can be unified into the sameint32_treturn. - No thread-local error string. No
rmw_set_error_string, normw_get_error_string. The thread-local heap allocation that pattern needs is unaffordable on embedded targets. Backends log verbose diagnostics at the failure site through the platform’sprintkequivalent — never buffered, never thread-local. - Smaller code-set. 13 codes total (vs upstream’s ~25). Phase
set started from upstream’s and dropped codes that don’t apply
(e.g., DDS event codes,
RMW_RET_NODE_INVALID). Adding a code is a<nros/rmw_ret.h>header change only.
Why same style. Named constants make switch statements
possible at call sites; bare negative ints don’t.
See also
- RMW API Design — deeper architectural rationale (heap, threading, dispatch model) shared across all the points above.
- Custom RMW Backend — step-by-step guide to writing a backend in C against this vtable.
<nros/rmw_vtable.h>Doxygen — full C reference for every function pointer above.packages/zpico/nros-rmw-zenoh— canonical reference port. Read the source for a worked example.