Production Readiness Checklist
A copy-out checklist for teams piloting nano-ros toward production deployment. Each box is a concrete validation step, not a marketing claim. The book documents the framework’s intent + plumbing; this page is what your team needs to confirm on your target before shipping.
Why a separate checklist? nano-ros is production-capable, but some acceptance items are hardware-gated (P99 latency on real Cortex-M3, multicast on real silicon, NuttX SCHED_SPORADIC under kernel config) and can’t be validated in CI. The checklist gives you the steps to close those gaps for your deployment.
1. Real-time metrics (hardware-validated)
The book’s quoted poll-WCET / P99-latency numbers come from QEMU. DWT cycle counters are best-effort under emulation. For production claims, re-measure on your actual silicon.
- End-to-end P99 latency (publisher → executor callback)
on your target MCU at its production clock + load. Target:
≤ design budget. Tooling:
wake-latency-cortex-m3bench atpackages/testing/nros-bench/wake-latency-cortex-m3/. - Worst-case stack depth per task. Tool:
cargo-call-stack,cargo-stack-sizes, or the ARM stack-analyzer for C/C++. - Heap fragmentation pattern over 24 h at nominal load.
Spot-check with
mallinfoor your RTOS’s heap-stats API. - Wake latency (transport-rx interrupt → first user callback dispatched). Required if you use the async / poll- blocked spin path.
- Spin-loop budget overrun rate under sustained pub load.
Executor::spin_once(timeout)returns the overrun count.
2. Platform-specific validation
Per-RTOS gaps that the book documents as “tested in CI” cover reference boards — your actual board / kernel-config combination may differ.
- Multicast / IGMP if using DDS (RTPS). Confirm SPDP discovery actually fires on your RTOS + driver. Untested on FreeRTOS + ThreadX as of writing.
- Clock wraparound + extension correctness on long-running
deployments. The platform’s
nros_platform_time_now_msmust handle u32 wrap (49.7 days) and u64 extend. - Allocator behavior under memory pressure. Boot-time alloc
OK on most RTOSes; mid-run alloc only on
stdPOSIX. Confirm your hot paths don’t allocate. - Network packet loss recovery. Drop 5% of packets in your lab and confirm the talker / listener recovers.
- Critical-section regions are short. The platform’s
nros_platform_critical_section_*ABI is the IRQ-disable surface; long critical sections starve other ISRs.
3. RMW backend certification
- Backend version pinned to a tested tuple. Zenoh-pico
1.7.2 (matches
rmw_zenoh_cpp). Cyclone DDS 0.10.5 (matchesros-humble-cyclonedds). XRCE-DDS Micro-Client at its workspace pin. - All required QoS policies supported by your backend. The Choosing an RMW Backend capability matrix lists per-backend coverage (Zenoh: 4/7; XRCE: 4/7; Cyclone DDS: 7/7).
- Discovery stability over your network topology. Zenoh-pico in client mode needs zenohd reachable; loss of router = lost routing but local node lives. XRCE needs Agent uptime. Cyclone DDS discovers via multicast SPDP.
- Bridge stability if multi-backend. Confirm no memory bloat over 72 h with two registered RMWs running.
- Cyclone DDS limitations checked. Status events and some stock-ROS interop slices are still in progress; embedded RTOS ports remain gated on a hosted Cyclone runtime.
4. Safety + formal verification
-
just verify-kaniclean against your build. 160 bounded harnesses; non-trivial coverage of CDR + scheduling + RMW glue. -
just verify-verusclean. 102 deductive proofs. - CRC32 attached if using
safety-e2efeature. The 37-byte attachment is transparent to stock ROS 2 (ignored gracefully) and detected by other nano-ros nodes. - Timeout bounds on every blocking call.
spin_oncetimeout,Promise::wait_for(timeout),recv_timeout. NoWAIT_FOREVER. - Parameter store capacity ≥ declared parameter count
([
param-services] feature gate). - Stack overflow detection enabled by your platform
(FreeRTOS
configCHECK_FOR_STACK_OVERFLOW, Zephyr stack sentinels, NuttXCONFIG_DEBUG_STACK).
5. Interop testing
- Publish from nano-ros, subscribe with stock ROS 2.
RMW_IMPLEMENTATION=rmw_zenoh_cppfor Zenoh,rmw_cyclonedds_cppfor Cyclone,rmw_fastrtps_cppfor DDS (interop tier). - Message type compatibility for any custom
.msgyou’ve added. Round-trip a sample message through ROS 2’srosbag2to confirm wire-level parity. - QoS profile matching. Mismatched reliability / durability / history kill discovery silently on DDS / RTPS.
- Lifecycle callbacks fire on node startup / shutdown if
you’ve opted into
lifecycle-services. - Cross-RTOS interop: if your fleet mixes RTOSes (e.g. Zephyr sensor + FreeRTOS actuator + POSIX coordinator), confirm all three sides see each other.
6. Failure recovery
- Agent / router restart: kill
zenohd(Zenoh) orMicroXRCEAgent(XRCE) mid-run. Confirm reconnection. For DDS / Cyclone this is N/A (no central process). - Network partition → reconnection. Block the talker’s
egress with
iptablesfor 30 s, then unblock. Verify the listener resumes within your design SLA. - Heap exhaustion path: graceful degradation OR clean crash + restart? If hosted-RTOS + watchdog, restart is usually correct. If bare-metal, you probably have no restart story — confirm your design assumes this.
- Stack overflow detection triggers a panic / fault rather than silent corruption.
- Power loss mid-write (if persisting state). Not nano-ros’s concern, but mention it in your design review.
7. Operational concerns
- Bootloader + OTA strategy. Out of scope for nano-ros but mandatory for fleet deployments — name it explicitly in your project plan.
- Log / diagnostics exfiltration.
nros-logprovides the logging surface; pick a sink (UART, RTT, semihosting, or ROS 2/rosoutover the wire). - Time synchronization (NTP, PTP, RTC). nano-ros doesn’t ship a time-sync layer; your fleet design must.
- Watchdog coverage: the executor’s
spin_periodreports overruns, but it doesn’t pet a hardware watchdog. Wire one manually.
8. License + governance
- License: MIT OR Apache 2.0 (dual). Both permissive, no GPL copyleft, OK for proprietary firmware. Confirm your legal team is comfortable.
- Third-party dependencies: zenoh-pico (Eclipse), Cyclone
DDS (Eclipse), Micro-XRCE-DDS-Client (Apache-2). Vendored
dependencies carry license files in each
third-party/*/LICENSE. - Patent grant: Apache 2.0 carries an explicit patent grant; MIT does not. Most adopters rely on the Apache half.
- Support model: nano-ros has no commercial support entity as of writing. Plan accordingly — either staff in-house expertise or contract a consultancy.
- Roadmap visibility: track the project’s roadmap directory in the upstream repo. Items are numbered and dated.
9. Maintainability pledge
The single-maintainer signal in §8 is a real adoption risk. Add explicit mitigations to your project plan rather than treating “open-source” as the answer:
- Maintenance horizon: record the date you adopted and
the upstream commit cadence (e.g.
git log --since=… --oneline | wc -lover the last 90 days). Re-check quarterly. - Escalation path: identify the primary maintainer
contact (from
Cargo.tomlauthors+ GitHub commit history). For CVEs, use a public GitHub security advisory. - Governance model: nano-ros is BDFL-style (benevolent-dictator-for-life) with no formal RFC / vote process. Major design decisions are documented in the roadmap directory; align your expectations accordingly.
- Fork mitigation: the dual MIT / Apache-2.0 license lets your team fork and maintain independently if upstream goes dormant. Budget for that possibility, not the assumption that it will happen.
- Distro tracking: nano-ros targets ROS 2 Humble today. Iron / Jazzy support depends on the type-hash work tracked in the upstream roadmap (no public ETA yet). If your product must ship on Iron / Jazzy at launch, plan to contribute the type-hash port or wait for upstream.
Scoring rubric
For each section above, count [x] boxes as your readiness score.
Suggested gates:
| Score per section | Status |
|---|---|
| 8/8 | Production-ready for that axis |
| 5–7/8 | Pilot deployment OK; close gaps before scale |
| 3–4/8 | Lab / prototype only |
| < 3/8 | Block on these items first |
Sum across all 9 sections (§1–9). Below ~50 / 70 you have foundational work to do; above ~62 / 70 you’re at production quality on every axis where nano-ros can be validated externally.
See also
- Real-Time Analysis — RT scheduling background + response-time formulas.
- Formal Verification — Kani + Verus harnesses.
- Safety Protocol — E2E CRC + sequence tracking.
- Choosing an RMW Backend — backend capability matrix.
- Supported Boards — per-board status + caveats.