Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Production Readiness Checklist

A copy-out checklist for teams piloting nano-ros toward production deployment. Each box is a concrete validation step, not a marketing claim. The book documents the framework’s intent + plumbing; this page is what your team needs to confirm on your target before shipping.

Why a separate checklist? nano-ros is production-capable, but some acceptance items are hardware-gated (P99 latency on real Cortex-M3, multicast on real silicon, NuttX SCHED_SPORADIC under kernel config) and can’t be validated in CI. The checklist gives you the steps to close those gaps for your deployment.

1. Real-time metrics (hardware-validated)

The book’s quoted poll-WCET / P99-latency numbers come from QEMU. DWT cycle counters are best-effort under emulation. For production claims, re-measure on your actual silicon.

  • End-to-end P99 latency (publisher → executor callback) on your target MCU at its production clock + load. Target: ≤ design budget. Tooling: wake-latency-cortex-m3 bench at packages/testing/nros-bench/wake-latency-cortex-m3/.
  • Worst-case stack depth per task. Tool: cargo-call-stack, cargo-stack-sizes, or the ARM stack-analyzer for C/C++.
  • Heap fragmentation pattern over 24 h at nominal load. Spot-check with mallinfo or your RTOS’s heap-stats API.
  • Wake latency (transport-rx interrupt → first user callback dispatched). Required if you use the async / poll- blocked spin path.
  • Spin-loop budget overrun rate under sustained pub load. Executor::spin_once(timeout) returns the overrun count.

2. Platform-specific validation

Per-RTOS gaps that the book documents as “tested in CI” cover reference boards — your actual board / kernel-config combination may differ.

  • Multicast / IGMP if using DDS (RTPS). Confirm SPDP discovery actually fires on your RTOS + driver. Untested on FreeRTOS + ThreadX as of writing.
  • Clock wraparound + extension correctness on long-running deployments. The platform’s nros_platform_time_now_ms must handle u32 wrap (49.7 days) and u64 extend.
  • Allocator behavior under memory pressure. Boot-time alloc OK on most RTOSes; mid-run alloc only on std POSIX. Confirm your hot paths don’t allocate.
  • Network packet loss recovery. Drop 5% of packets in your lab and confirm the talker / listener recovers.
  • Critical-section regions are short. The platform’s nros_platform_critical_section_* ABI is the IRQ-disable surface; long critical sections starve other ISRs.

3. RMW backend certification

  • Backend version pinned to a tested tuple. Zenoh-pico 1.7.2 (matches rmw_zenoh_cpp). Cyclone DDS 0.10.5 (matches ros-humble-cyclonedds). XRCE-DDS Micro-Client at its workspace pin.
  • All required QoS policies supported by your backend. The Choosing an RMW Backend capability matrix lists per-backend coverage (Zenoh: 4/7; XRCE: 4/7; Cyclone DDS: 7/7).
  • Discovery stability over your network topology. Zenoh-pico in client mode needs zenohd reachable; loss of router = lost routing but local node lives. XRCE needs Agent uptime. Cyclone DDS discovers via multicast SPDP.
  • Bridge stability if multi-backend. Confirm no memory bloat over 72 h with two registered RMWs running.
  • Cyclone DDS limitations checked. Status events and some stock-ROS interop slices are still in progress; embedded RTOS ports remain gated on a hosted Cyclone runtime.

4. Safety + formal verification

  • just verify-kani clean against your build. 160 bounded harnesses; non-trivial coverage of CDR + scheduling + RMW glue.
  • just verify-verus clean. 102 deductive proofs.
  • CRC32 attached if using safety-e2e feature. The 37-byte attachment is transparent to stock ROS 2 (ignored gracefully) and detected by other nano-ros nodes.
  • Timeout bounds on every blocking call. spin_once timeout, Promise::wait_for(timeout), recv_timeout. No WAIT_FOREVER.
  • Parameter store capacity ≥ declared parameter count ([param-services] feature gate).
  • Stack overflow detection enabled by your platform (FreeRTOS configCHECK_FOR_STACK_OVERFLOW, Zephyr stack sentinels, NuttX CONFIG_DEBUG_STACK).

5. Interop testing

  • Publish from nano-ros, subscribe with stock ROS 2. RMW_IMPLEMENTATION=rmw_zenoh_cpp for Zenoh, rmw_cyclonedds_cpp for Cyclone, rmw_fastrtps_cpp for DDS (interop tier).
  • Message type compatibility for any custom .msg you’ve added. Round-trip a sample message through ROS 2’s rosbag2 to confirm wire-level parity.
  • QoS profile matching. Mismatched reliability / durability / history kill discovery silently on DDS / RTPS.
  • Lifecycle callbacks fire on node startup / shutdown if you’ve opted into lifecycle-services.
  • Cross-RTOS interop: if your fleet mixes RTOSes (e.g. Zephyr sensor + FreeRTOS actuator + POSIX coordinator), confirm all three sides see each other.

6. Failure recovery

  • Agent / router restart: kill zenohd (Zenoh) or MicroXRCEAgent (XRCE) mid-run. Confirm reconnection. For DDS / Cyclone this is N/A (no central process).
  • Network partition → reconnection. Block the talker’s egress with iptables for 30 s, then unblock. Verify the listener resumes within your design SLA.
  • Heap exhaustion path: graceful degradation OR clean crash + restart? If hosted-RTOS + watchdog, restart is usually correct. If bare-metal, you probably have no restart story — confirm your design assumes this.
  • Stack overflow detection triggers a panic / fault rather than silent corruption.
  • Power loss mid-write (if persisting state). Not nano-ros’s concern, but mention it in your design review.

7. Operational concerns

  • Bootloader + OTA strategy. Out of scope for nano-ros but mandatory for fleet deployments — name it explicitly in your project plan.
  • Log / diagnostics exfiltration. nros-log provides the logging surface; pick a sink (UART, RTT, semihosting, or ROS 2 /rosout over the wire).
  • Time synchronization (NTP, PTP, RTC). nano-ros doesn’t ship a time-sync layer; your fleet design must.
  • Watchdog coverage: the executor’s spin_period reports overruns, but it doesn’t pet a hardware watchdog. Wire one manually.

8. License + governance

  • License: MIT OR Apache 2.0 (dual). Both permissive, no GPL copyleft, OK for proprietary firmware. Confirm your legal team is comfortable.
  • Third-party dependencies: zenoh-pico (Eclipse), Cyclone DDS (Eclipse), Micro-XRCE-DDS-Client (Apache-2). Vendored dependencies carry license files in each third-party/*/LICENSE.
  • Patent grant: Apache 2.0 carries an explicit patent grant; MIT does not. Most adopters rely on the Apache half.
  • Support model: nano-ros has no commercial support entity as of writing. Plan accordingly — either staff in-house expertise or contract a consultancy.
  • Roadmap visibility: track the project’s roadmap directory in the upstream repo. Items are numbered and dated.

9. Maintainability pledge

The single-maintainer signal in §8 is a real adoption risk. Add explicit mitigations to your project plan rather than treating “open-source” as the answer:

  • Maintenance horizon: record the date you adopted and the upstream commit cadence (e.g. git log --since=… --oneline | wc -l over the last 90 days). Re-check quarterly.
  • Escalation path: identify the primary maintainer contact (from Cargo.toml authors + GitHub commit history). For CVEs, use a public GitHub security advisory.
  • Governance model: nano-ros is BDFL-style (benevolent-dictator-for-life) with no formal RFC / vote process. Major design decisions are documented in the roadmap directory; align your expectations accordingly.
  • Fork mitigation: the dual MIT / Apache-2.0 license lets your team fork and maintain independently if upstream goes dormant. Budget for that possibility, not the assumption that it will happen.
  • Distro tracking: nano-ros targets ROS 2 Humble today. Iron / Jazzy support depends on the type-hash work tracked in the upstream roadmap (no public ETA yet). If your product must ship on Iron / Jazzy at launch, plan to contribute the type-hash port or wait for upstream.

Scoring rubric

For each section above, count [x] boxes as your readiness score. Suggested gates:

Score per sectionStatus
8/8Production-ready for that axis
5–7/8Pilot deployment OK; close gaps before scale
3–4/8Lab / prototype only
< 3/8Block on these items first

Sum across all 9 sections (§1–9). Below ~50 / 70 you have foundational work to do; above ~62 / 70 you’re at production quality on every axis where nano-ros can be validated externally.

See also