DD001: Body Physics Engine (Sibernetic) Architecture¶
- Status: Accepted
- Author: Andrey Palyanov, Sergey Khayrulin, OpenWorm Core Team
- Date: 2026-02-14
- Supersedes: None
- Related: DD002 (Neural Circuit), DD003 (Muscle Model), DD004 (Mechanical Cell Identity)
TL;DR¶
PCISPH SPH framework (Sibernetic) simulating the worm as ~100K particles — liquid (pseudocoelom), elastic (body wall), boundary (environment). Muscle forces from DD003 calcium drive body deformation and locomotion. Success: kinematic validation within ±15%, density deviation <1%.
Architecturally differentiable. The native Metal substrate (src/metal_diff/) implements every forward kernel with a paired analytic backward kernel, validated against finite-difference. End-to-end reverse-mode differentiation through multi-step XPBD integration produces gradients on initial conditions and all physical parameters (rest density, spring stiffness, viscosity, density-constraint compliance, floor restitution, membrane mechanics). This is a structural property of the substrate — not an optional layer — and is the design requirement that the in-progress CUDA port inherits.
Quick Action Reference¶
| Question | Answer |
|---|---|
| Phase | Phase 0 |
| Layer | Core Architecture — see Phase Roadmap |
| What does this produce? | Particle position time series (~100K SPH particles), WCON trajectory files, rendered body frames, gradients on physical parameters via reverse-mode AD |
| Success metric | DD010 Tier 3: kinematic metrics within ±15%; density deviation <1% for liquid particles; every gradient kernel within ±5% rel-err of finite-difference |
| Differentiability | Native Metal substrate (src/metal_diff/) is end-to-end differentiable. Multi-step xpbd_full_bwd produces gradients on (x_init, v_init, ρ_rest, spring_K, viscosity, α_density, floor_y, restitution). See Differentiability below. |
| Validation methodology | Every physics change in Sibernetic must follow the 8-phase predict → reference → inspect → refine → implement → SGD-tune → render → compare workflow. Mind-of-a-Worm enforces this on every PR via the Validation Methodology checklist. |
| Repository | openworm/sibernetic — issues labeled dd001 |
| Config toggle | body.enabled: true / body.backend: opencl in openworm.yml |
| Build & test | docker compose run quick-test (no NaN/segfault, *.wcon exists), docker compose run validate (Tier 3) |
| Visualize | DD012 body/positions/ layer — SPH particles colored by type (liquid=blue, elastic=green, boundary=gray) |
| CI gate | Tier 3 kinematic validation + physical stability (no particle escape) blocks merge |
| --- |
Goal & Success Criteria¶
| Criterion | Target | DD010 Tier |
|---|---|---|
| Primary: Kinematic metrics | Within ±15% of Schafer lab baseline | Tier 3 (blocking) |
| Secondary: Physical stability | No particle escape, no NaN, no divergence over 10 s simulation | Tier 3 (blocking) |
| Tertiary: Density deviation | <1% for liquid particles after PCISPH convergence | Tier 3 (blocking) |
Before: No fluid-structure interaction — rigid body or mass-spring models that cannot capture pseudocoelomic pressure or hydrostatic skeleton mechanics.
After: ~100K SPH particles (liquid + elastic + boundary) with PCISPH pressure solver, enabling coupled fluid-solid locomotion driven by muscle forces from DD003.
Deliverables¶
| Artifact | Path (relative to openworm/sibernetic) |
Format | Example |
|---|---|---|---|
| Particle position time series | output/ (per-run) |
Binary state dump or WCON trajectory | output.dat, output.wcon |
| WCON trajectory files | output/*.wcon |
WCON JSON | Worm centroid + posture over time |
| Rendered frames / video | output/frames/ |
PNG (per-frame) | frame_00001.png |
| Sibernetic binary (C++/OpenCL) | build/Sibernetic |
Compiled executable | ./Sibernetic -config ... |
| Native Metal substrate | src/metal_diff/ |
C++/Objective-C++/Metal shaders | src/metal_diff/sib_metal, src/metal_diff/shaders.metal |
| Native CUDA substrate | src/cuda/ |
C++/CUDA | src/cuda/sib_cuda (PR #229, in review) |
| Particle positions (viewer) | OME-Zarr: body/positions/, shape (n_timesteps, n_particles, 3) |
OME-Zarr | Per-particle (x, y, z) over all output timesteps |
| Particle types (viewer) | OME-Zarr: body/types/, shape (n_particles,) |
OME-Zarr | Enum: liquid/elastic/boundary |
| Surface mesh (viewer) | OME-Zarr: geometry/body_surface/ (per-frame OBJ or vertices+faces arrays) |
OME-Zarr | Reconstructed smooth body surface per timestep |
| Stability checker | scripts/check_stability.py |
Python script | Verify no NaN, no particle escape, no divergence [TO BE CREATED] |
| Incompressibility validator | scripts/validate_incompressibility.py |
Python script | Verify density deviation <1% [TO BE CREATED] |
| Backend parity test suite | scripts/backend_parity_test.py |
Python script | Cross-backend kinematic comparison [TO BE CREATED] |
Repository & Issues¶
| Item | Value |
|---|---|
| Repository | openworm/sibernetic |
| Issue label | dd001 |
| Milestone | Body Physics Engine |
| Branch convention | dd001/description (e.g., dd001/pcisph-stability-fix) |
| Example PR title | DD001: Fix density deviation in PCISPH pressure solver |
How to Build & Test¶
Build and run instructions live with the code, in the Sibernetic README. That's the canonical source — it tracks the actual build system as it evolves (Docker stages, CMake flags, supported platforms, GPU SDK setup, demo configurations) and is updated more frequently than this design document.
What this design document does specify is the acceptance criteria for whether a build is correct (below), and the validation workflow any contributor must follow when changing the physics (see Validation Methodology).
Acceptance Criteria (Green-Light Definitions)¶
A working build must satisfy, on at least one of the Compute Backends:
| Gate | What it proves | How it's run |
|---|---|---|
| quick-test | Simulation completes without NaN, segfault, SIGKILL; output/*.wcon exists |
docker compose run quick-test (or the equivalent native path documented in the Sibernetic README) |
| validate | Density deviation < 1% for liquid particles; kinematic metrics within ±15% of baseline; no particle escape | docker compose run validate |
| parity (native substrates) | Per-demo kinematic match against OpenCL reference within ±5% — see Cross-Backend Parity Requirements | python3 tests/test_demo*_backend_parity.py |
| differentiability (native substrates) | Every paired backward kernel within ±5% relative error of finite-difference | python3 src/metal_diff/test_*.py (and equivalent for src/cuda/) |
Validation Tooling Status¶
Two stability/incompressibility scripts referenced in older validation documents may not yet exist in the repo. If absent, they should be created against openworm/sibernetic with label dd001:
| Script | Status | Tracking |
|---|---|---|
scripts/check_stability.py |
Verify in repo | openworm/sibernetic — label dd001 |
scripts/validate_incompressibility.py |
Verify in repo | openworm/sibernetic — label dd001 |
How to Visualize¶
DD012 viewer layer: body/positions/ — SPH particles colored by type.
| Viewer Feature | Specification |
|---|---|
| Layer | body/positions/ |
| Color mode | Particles colored by type: liquid=blue, elastic=green, boundary=gray |
| Type data | OME-Zarr: body/types/, shape (n_particles,) — enum mapping each particle to its type |
| Surface mesh | OME-Zarr: geometry/body_surface/ — reconstructed smooth body surface per timestep |
| What you should SEE | A worm-shaped particle cloud deforming over time. Blue liquid particles filling the interior (pseudocoelom), green elastic particles forming the body wall, gray boundary particles defining the environment. Muscle activation should produce visible bending. Surface mesh should show smooth locomotion. |
| Comparison view | Side-by-side: particle view (raw SPH) vs. surface mesh (reconstructed body shape) |
Technical Approach¶
Smoothed Particle Hydrodynamics (SPH) as the Body Physics Framework¶
OpenWorm uses Predictive-Corrective Incompressible SPH (PCISPH) to simulate the worm as a deformable body embedded in a viscous fluid. SPH represents continuous fields (density, velocity, pressure) as weighted sums over discrete particles, making it naturally suited for coupled fluid-solid systems.
Three Particle Types¶
| Type | Represents | Count | Dynamics | Color (Viz) |
|---|---|---|---|---|
| Liquid | Pseudocoelom fluid | ~50,000 | Navier-Stokes via SPH | Blue |
| Elastic | Body wall structure | ~40,000 | Spring-bonded network | Green |
| Boundary | Environmental surfaces | ~10,000 | Fixed or moving constraints | Gray |
Total particle count: ~100,000 for an adult hermaphrodite.
Physical Parameters¶
| Parameter | Value | Units | Biological Basis |
|---|---|---|---|
| Rest density (ρ₀) | 1000 | kg/m³ | Aqueous tissue |
| Viscosity (µ) | 4e-6 | Pa·s | Low Reynolds number (Re ≤ 0.05) |
| Smoothing radius (h) | 3.34 | Particle units | Determines neighbor interaction range |
| Timestep (dt) | 2.0e-5 | s | Stability for explicit integration |
| Simulation length | ~311 | Particles | Adult body length (~1 mm) |
| Adult worm mass | 3.25e-9 | kg | Mapped to total particle mass |
| Elasticity coefficient | 4 × 1.5e-4 / mass | -- | Spring stiffness for elastic bonds |
| Max muscle contraction force | 2.7 × 10⁻⁹ | N | Experimental range: (1.4–9.6) × 10⁻⁹ N |
| Number of muscle units | 96 | -- | 95 body-wall muscles, 24 per quadrant × 4 |
Validated Kinematic Outputs (Palyanov et al. 2018)¶
| Metric | Simulated | Experimental | Source |
|---|---|---|---|
| Crawling velocity | 0.13–0.15 mm/s | 0.1–0.3 mm/s | Table 1 |
| Crawling frequency | 0.36–0.37 Hz | 0.3–0.8 Hz | Table 1 |
| Crawling wavelength | 0.62–0.72 mm | 0.65 mm | Table 1 |
| Swimming velocity | 0.26–0.41 mm/s | 0.29 ± 0.03 mm/s | Table 1 |
| Swimming frequency | 1.76–1.83 Hz | 1.74 ± 0.16 Hz | Table 1 |
| Swimming wavelength | 0.43–0.47 mm | 0.46 mm | Table 1 |
| Freq–wavelength slope | 0.59 | 0.64 (Boyle et al.) | Figure 5 |
| Freq–wavelength intercept | 0.48 | 0.42 (Boyle et al.) | Figure 5 |
Validated behaviors: Forward crawling, swimming, omega turns, body shortening, two-frequency undulation (head vs. body).
SPH Kernel Functions¶
Following Müller, Charypar & Gross (2003):
Density estimation (Wpoly6):
ρᵢ = Σⱼ mⱼ * Wpoly6(rᵢⱼ, h)
Wpoly6(r, h) = (315 / (64πh⁹)) * (h² - r²)³ if r < h, else 0
Pressure gradient (∇Wspiky):
∇pᵢ = Σⱼ mⱼ * ((pᵢ + pⱼ) / (2ρⱼ)) * ∇Wspiky(rᵢⱼ, h)
∇Wspiky(r, h) = -(45 / (πh⁶)) * (h - r)² * (r / |r|) if r < h, else 0
Viscous diffusion (∇²Wviscosity):
Fᵢ^viscosity = µ * Σⱼ mⱼ * ((vⱼ - vᵢ) / ρⱼ) * ∇²Wviscosity(rᵢⱼ, h)
∇²Wviscosity(r, h) = (45 / (πh⁶)) * (h - r) if r < h, else 0
Elastic Bonds (Body Wall Structure)¶
Elastic particles are connected by spring-like bonds with rest lengths:
F_elastic = k * (|rᵢⱼ| - L₀) * (rᵢⱼ / |rᵢⱼ|)
- k = elasticity coefficient (see table above)
- L₀ = rest length (initial inter-particle distance)
Bonds are created during initialization based on spatial proximity. Particles within the smoothing radius are bonded if they are both elastic-type.
Muscle Actuation (Force Injection)¶
Muscle cell mapping (Palyanov et al. 2018): The elastic shell is mapped into 4 longitudinal muscle bundles (VR, VL, DR, DL), and each bundle is subdivided into 24 areas representing individual muscle cells with geometries based on WormAtlas microphotographs. This gives 95 body-wall muscles (96 independently activable units). Muscle naming follows the DL side convention and is mirrored for DR, VR, and VL quadrants.
Muscle forces from the calcium-force coupling (DD003) are injected by modulating elastic bond stiffness:
k_muscle(t) = k_baseline * (1 + activation(t) * muscle_strength_multiplier)
Where activation(t) comes from the [Ca²⁺]ᵢ time series of each muscle cell. Bonds tagged as muscle bonds (MDR, MVR, MVL, MDL in 4 quadrants) receive this time-varying stiffness. Each of the 96 muscle units can be activated independently, enabling the full range of body postures observed in C. elegans.
Predictive-Corrective Pressure Solver (PCISPH — OpenCL Reference)¶
Standard SPH suffers from density fluctuations causing artificial pressure waves. PCISPH (Solenthaler & Pajarola 2009) adds a predictive-corrective iteration:
- Predict: Estimate particle positions at t + dt using current forces
- Correct: Iteratively adjust pressure forces to maintain incompressibility (ρ ≈ ρ₀)
- Update: Advance to the next timestep once density error < threshold
This stabilizes the simulation at the cost of ~3-7 iterations per timestep. PCISPH is the approach used by the OpenCL reference implementation (sphFluid.cl) and is the validated kinematic baseline against which all newer substrates are measured.
Extended Position-Based Dynamics (XPBD — Native Metal / CUDA Substrates)¶
The native Metal substrate (and the in-progress CUDA substrate) uses eXtended Position-Based Dynamics (XPBD) (Macklin, Müller & Chentanez 2016) as the constraint-projection framework. XPBD generalizes Position-Based Dynamics by introducing a per-constraint compliance parameter α, which is the inverse stiffness: small α ⇒ stiff (hard) constraint, large α ⇒ compliant (soft) constraint. Each constraint contributes a Lagrange-multiplier update that, when iterated K times per timestep, drives the simulation toward the constrained manifold.
The Sibernetic native substrate expresses each physical phenomenon as an XPBD constraint and projects them sequentially per timestep:
- Predict:
predict_positionsadvances positions using gravity, external forces, prior velocity (one shader pass) - Constrain: K projection passes, each solving one constraint kind in turn:
- Density constraint (
solve_density_constraint) — particles within smoothing radiushproject toward target densityρ_restwith complianceα_density. This is the XPBD equivalent of PCISPH's iterative pressure correction. - Distance constraints (
solve_distance_constraints_seq) — elastic bonds (springs + anchors) project toward rest length. - Floor / box-clamp constraints (
solve_floor_constraint,clamp_to_box) — boundary collision with restitution. - Membrane constraints (
membrane_apply) — plane-projection through Ihmsen 2014 membrane mechanism (M10 kernels).
- Density constraint (
- Update velocities:
update_velocitiesderives new velocities from position deltas (one shader pass)
For phenomena that don't fit cleanly as constraints — viscous pair forces, surface tension, SPH pressure-gradient buoyancy needed for water-on-worm dynamics — additional force kernels (pair_forces_grid, pair_forces_pressure_grid) are interleaved within the same XPBD loop, contributing impulses that the constraint projection then resolves against.
Why XPBD for the Native Substrate¶
The XPBD choice is what makes the substrate differentiable end-to-end:
-
Each constraint has a closed-form Jacobian. Every projection step's gradient
∂x_new / ∂x_oldcan be derived analytically. The substrate exploits this: every forward constraint kernel (solve_density_constraint,solve_distance_constraints_seq, etc.) has a paired backward kernel that walks the Jacobian, FD-validated to ±5% relative error. PCISPH's iterative pressure-force loop produces gradients in principle too, but autodiff through the iteration is memory-expensive and analytic backwards through iterative implicit methods are harder to derive. XPBD's per-constraint structure is the cleanest match for analytic backward-mode AD. See Differentiability. -
Per-step state is bounded and saveable. Each XPBD constraint kernel reads a fixed set of inputs and writes a fixed set of outputs. The substrate persists per-step state (positions, velocities, density, ∇C, denominator helpers, per-kernel auxiliaries) so
xpbd_full_bwdcan walk K constraint iterations in reverse. The state's bounded shape per kernel makes this practical at the worm-scale particle counts (~100K particles × multiple kernels × tens of timesteps). -
Unconditional stability (PBD heritage). XPBD doesn't have the CFL-like stiffness constraint that PCISPH inherits from explicit pressure-force integration. Stiffer constraints don't require smaller timesteps — they require lower compliance α. This means kinematic parameter tuning via SGD can vary stiffness across orders of magnitude without restabilizing the integration loop.
Cross-Approach Parity¶
PCISPH and XPBD are mathematically distinct, but the validation methodology requires the native (XPBD) substrate to reproduce the OpenCL (PCISPH) reference's kinematic outputs — the per-demo trajectory parity gates ensure equivalent behavior at the observable level even though the inner numerical scheme differs. Where pure XPBD with constraint projection alone produced wrong kinematics (e.g., a worm sinking because there was no upward SPH pressure-gradient force from surrounding liquid), the substrate added the missing force kernel (pair_forces_pressure_grid, ported from sphFluid.cl) rather than abandoning the XPBD framework. The substrate is thus a hybrid: XPBD as the constraint-projection backbone with SPH force kernels interleaved where projection alone is insufficient. This hybrid is what reaches forward parity on the five validated demos; see Cross-Backend Parity Requirements.
The choice is also a Position-Based Dynamics revisitation. PBD itself was rejected for Sibernetic's original architecture as too inaccurate (see Alternatives Considered §4). XPBD's per-constraint compliance restores the physical accuracy that vanilla PBD lacked while keeping PBD's stability and differentiability properties — making it a fit for biophysical validation that vanilla PBD never was.
Alternatives Considered¶
1. Finite Element Method (FEM)¶
Advantages: Higher accuracy for solid mechanics, well-established in biomechanics.
Rejected because:
- Mesh generation for complex morphology is labor-intensive
- Large deformations (bending during crawling) cause mesh distortion
- Fluid-structure coupling in FEM+CFD is computationally expensive
- SPH is meshless, naturally handles topology changes, and couples fluid-solid seamlessly
When to reconsider: If future work requires extremely accurate stress/strain fields (e.g., modeling cuticle fracture, cell rupture). Not needed for behavior simulation.
Update (2026-02): Zhao et al. (2024) demonstrated that modern Projective Dynamics FEM solvers (Bouaziz et al. 2014) address the mesh distortion and speed concerns listed above. Their implementation used a tetrahedral mesh of 984 vertices and 3,341 tetrahedrons with 96 muscle actuators and achieved real-time simulation at 30 frames per second — orders of magnitude faster than the current Sibernetic SPH approach (~100K particles). The key insight is that a Projective Dynamics solver uses a local-global optimization that is unconditionally stable under large deformations, eliminating the traditional FEM mesh distortion problem.
However, Zhao et al.'s FEM approach uses simplified surface hydrodynamics (thrust and drag forces on the body surface) rather than solving full fluid dynamics. This is a valid approximation at the low Reynolds number of C. elegans locomotion (Re ~ 0.01) but sacrifices the internal pseudocoelomic fluid pressure dynamics that Sibernetic's SPH naturally captures.
OpenWorm's position: Sibernetic SPH remains the biophysically richer model and the default backend. A Projective Dynamics FEM backend should be added as a fast alternative for rapid iteration, CI testing, and parameter sweeps — similar in philosophy to DD013's learned surrogate but using first-principles physics rather than machine learning. The BAAIWorm repository (github.com/Jessie940611/BAAIWorm, Apache 2.0) contains a C++/CUDA FEM implementation that could serve as a starting point, though its CUDA/OptiX dependencies would need evaluation for compatibility.
Configuration: body.backend: "fem-projective" alongside existing opencl, taichi-metal, taichi-cuda, pytorch.
Reference: Bouaziz S et al. (2014). "Projective Dynamics: Fusing constraint projections for fast simulation." ACM Trans Graphics 33:154.
2. Mass-Spring System (Simple Elastic Network)¶
Advantages: Extremely fast, simple to implement.
Rejected because:
- No fluid mechanics (pseudocoelom is critical for worm locomotion)
- Difficult to enforce incompressibility
- No pressure dynamics
This was tried in early OpenWorm prototypes and abandoned because crawling requires fluid pressure to antagonize muscle contraction.
3. Lattice Boltzmann Method (LBM)¶
Advantages: Good for fluid flow, naturally parallelizes.
Rejected because:
- Fixed Cartesian grid poorly represents complex worm morphology
- Deformable solid modeling in LBM is cumbersome
- SPH is more natural for moving, deforming boundaries
4. Position-Based Dynamics (PBD)¶
Advantages: Unconditionally stable, used in real-time graphics/games.
Rejected because:
- Sacrifices physical accuracy for stability (violates momentum conservation in general)
- Not suited for quantitative biophysical validation
- OpenWorm prioritizes biological accuracy over simulation speed
5. Boyle-Berri-Cohen 2D Rod-Spring Model¶
Advantages: Extremely fast (~seconds on single CPU), published and validated (Boyle, Berri & Cohen 2012), already implemented in three OpenWorm repos (CE_locomotion, Worm2D, CelegansNeuromechanicalGaitModulation), ~150 state variables, produces WCON output directly, has c302 integration and proprioception support.
Not a replacement for Sibernetic because:
- 2D only — no dorsal-ventral body mechanics, omega turns, or body roll
- No fluid dynamics — uses drag coefficients (Resistive Force Theory), not solved Navier-Stokes
- No internal body volume — cannot support DD004 (cell identity), DD012.2 (mesh deformation), or Phase 3+ organs
- No fluid-structure interaction — DD015 touch mechanotransduction requires 3D particle strain
Role in OpenWorm: Fast validation screening tool (scripts/boyle_berri_cohen_trajectory.py in c302 repo). Takes c302 muscle calcium output, runs the 2D body model, produces WCON in seconds. Enables rapid iteration on neural circuit parameters and CI quick-test gates. Also useful for generating DD013 surrogate training data. Sibernetic SPH remains the mission-critical body physics engine.
Quality Criteria¶
A contribution to Sibernetic MUST:
-
Preserve Physical Stability: Simulations must not explode (density < 0, particles escaping to infinity, NaN values). Run at least 10 seconds of simulation time without divergence.
-
Maintain Incompressibility: Density deviation from ρ₀ should be < 1% for liquid particles after PCISPH convergence.
-
Pass Unit Tests: Sibernetic repository contains unit tests for kernel functions, neighbor search, and pressure solver. All must pass.
-
Validate Against Known Cases:
- Drop test: Sphere of liquid particles under gravity should settle and spread
- Elastic deformation: A suspended elastic body under gravity should sag
- Muscle contraction: Activating one muscle quadrant should bend the body
-
GPU Backend Compatibility: Changes to core SPH algorithms must work across OpenCL (original C++ reference), Native Metal (Apple Silicon), Native CUDA (NVIDIA), and PyTorch (CPU reference). Test on at least two backends.
-
Cross-Backend Parity: Core SPH algorithms must produce kinematic outputs within ±5% across all stable backends on the same configuration. The parity test suite (see Backend Stabilization Roadmap) must pass before any backend is marked Production.
-
Paired Backward Per Forward (Native Substrates): Every new forward kernel added to the native Metal or native CUDA substrate MUST ship with a paired analytic backward kernel, validated against finite-difference to within ±5% relative error. This is the architectural contract of the substrate; see Differentiability. New kernels without a paired backward break the end-to-end differentiability guarantee and must not land.
-
Validation Methodology Followed: Every physics change MUST follow the 8-phase predict → reference → inspect → refine → implement → SGD-tune → render → compare workflow documented in Validation Methodology. PRs that ship a Metal/CUDA implementation without (a) a benchmark config, (b) an OpenCL reference trajectory, (c) a Metal/CUDA trajectory dump, (d) a side-by-side comparison movie, and (e) visual + quantitative parity evidence are not ready to land. Hand-swept parameters are not acceptable — parameter tuning must use the SGD harness with a saved convergence history. This applies to every kernel-level PR (new physics, parameter changes, optimization PRs that affect numerical output, etc.).
Validation Methodology¶
Every physics change in Sibernetic — new kernel, parameter retuning, optimization PR that perturbs numerical output, backend port — must follow the 8-phase workflow below. Mind-of-a-Worm uses the checklist at the end of this section as a binding PR gate. Reviewers (human or AI) MUST verify each item before approving.
The methodology emerged from the native-Metal port (consolidated on the ow-native-gpu-0.1.0 branch) and is now baked into the repo's tooling. It is the substrate-correctness story: hand-derived backward kernels per Differentiability prove the math is right; this workflow proves the physics is right.
Why This Is Horizontal¶
Earlier OpenCL-only Sibernetic relied on "build the kernel, eyeball the demo, ship it." That worked when there was one reference implementation. With three substrates now (OpenCL gold-standard reference, native Metal, native CUDA in scaffolding) and a differentiable pipeline that can be SGD-tuned, the cost of an incorrect kernel landing is now multi-platform divergence that's expensive to retroactively diagnose. The 8-phase workflow front-loads the cost of being right: predict before measuring, measure before implementing, tune via gradient descent not manual sweeps, prove parity visually frame-by-frame.
This applies horizontally to every PR. There is no "small physics change" exemption — even a parameter default tweak goes through the workflow because the trajectory is what matters, not the change-set size.
The 8-Phase Workflow¶
Phase 1 — Predict Correct Behavior (Before Running Anything)¶
Read the benchmark configuration file (typically under configuration/<demo> or configuration/test/<demo>). Extract the physics setup: simulation box bounds, particle positions and types, velocities, spring bond topology (rest length, connection type), membranes if present, boundary anchor locations.
For analytically tractable scenarios (single elastic + anchor, simple oscillators, cube under gravity), calculate expected equilibrium positions, oscillation periods, settling times, displacements by hand using the documented formulas (e.g., for elastic-on-anchor: T = 2π / √(K_eff / m_eff) where K_eff = elasticityCoefficient × 0.25 × sim_scale for non-worm-body elastic pairs — the 0.25 factor is a frequent gotcha).
Write down the prediction with units BEFORE touching OpenCL or Metal. This written prediction becomes the SGD target in Phase 5 and the parity benchmark in Phase 7. The act of writing the prediction first surfaces assumptions and identifies where intuition is wrong.
Phase 2 — Run OpenCL Gold-Standard Reference¶
Run the benchmark config on the OpenCL backend (the validated ±15%-against-Schafer-lab reference). This produces a reference trajectory: per-particle position time series, typically saved as <demo>_opencl_position.txt or equivalent.
The OpenCL trajectory is the ground truth for what the Metal (and future CUDA) implementations must match. Commit it to the repo alongside the new physics change so reviewers can re-run parity checks.
Phase 3 — Inspect & Refine Prediction Against OpenCL¶
Parse the OpenCL trajectory and measure observed behavior: actual oscillation period (zero-crossing analysis), amplitude, equilibrium, drift, damping, settling time. Compare against the Phase 1 prediction.
zero_crossings = np.where(np.diff(np.sign(y_centered)) > 0)[0]
observed_period_ms = np.diff(zero_crossings).mean() * frame_dt_ms
Surprises are common and informative: - Period off by 2× → forgot the 0.25 factor on non-worm-body elastic pairs - Particle drift when stationary → ε guards differ between paths - Faster fall than expected → buoyancy from SPH pressure gradients - Unexpected damping → viscous coupling through neighbor particles
Update the written prediction with the measured values. These updated targets are what Phase 5 SGD will optimize against. A correct Metal port must reproduce what OpenCL actually does, not what we thought it would do.
Phase 4 — Implement on Metal (or CUDA)¶
Build the native substrate (./build.sh in src/metal_diff/; analogous for src/cuda/ once it lands). Implement the new kernel or change. Run the benchmark via dump_metal_trajectory.py (or the equivalent CUDA dumper) with the same duration as the OpenCL reference:
python3 src/metal_diff/dump_metal_trajectory.py \
--scenario <demo_name> \
--steps 2500 --chunk 5 \
--rho-rest 1.0 \
--spring-k <initial-guess> --anchor-k <initial-guess> \
--out /tmp/<demo>_metal.txt
Sanity-check immediately: trajectory range should match the OpenCL prediction within a few percent. Massive divergence (>50%) means there's a kernel-level physics error to fix before parameter tuning is meaningful. Per Quality Criteria #7, the new kernel must ship with a paired analytic backward and FD test.
Phase 5 — SGD-Tune Parameters (Mandatory; Hand-Sweeps Forbidden)¶
Use the SGD harness (src/metal_diff/sgd_*.py; sgd_one_sprig.py for single-spring scenarios, sgd_true.py for multi-parameter tuning, sgd_worm.py for worm-scale, plus demo-specific scripts). Each iteration:
- Run the simulation with current parameters via
dump_metal_trajectory.py - Measure trajectory features (period, amplitude, final position, kinematic metrics)
- Compute scalar loss against the Phase 3 measured targets
- Compute gradients — finite-difference over the CLI for measurement-based losses; analytic backward kernels via
xpbd_full_bwdfor learned parameters - Update in log-space for stiffness-like params, linear-space (
--linear-trainable) for bounded params likefloor_y
Save the convergence history to /tmp/sgd_<demo>_history.json and commit it alongside the parity artifacts. This is the evidence that parameters were not hand-fitted.
If SGD plateaus on a loss that's worse than OpenCL by a wide margin, the problem is missing physics (a kernel), not bad parameter tuning. Port the missing OpenCL kernel — do not ship with caveats. (Example: worm sinking because no buoyancy → port the SPH pressure-force kernel. Example: water exploding → port the PCISPH iterative density correction.)
Phase 6 — Render Comparison Movies¶
Render OpenCL and Metal trajectories as separate panels using the per-scenario renderer (render_one_sprig.py, render_worm.py, etc. under src/metal_diff/), then stack horizontally with ffmpeg:
python3 src/metal_diff/render_one_sprig.py \
/tmp/<demo>_opencl_position.txt /tmp/<demo>_opencl_panel.mp4 \
--title "OpenCL <demo>"
python3 src/metal_diff/render_one_sprig.py \
/tmp/<demo>_metal.txt /tmp/<demo>_metal_panel.mp4 \
--title "Native Metal <demo>" --max-frames 250
ffmpeg -y -i /tmp/<demo>_opencl_panel.mp4 -i /tmp/<demo>_metal_panel.mp4 \
-filter_complex "[0:v]pad=920:900:0:0:color=white[left];[left][1:v]hstack[v]" \
-map "[v]" -c:v libx264 -crf 18 -pix_fmt yuv420p \
docs/<demo>_opencl_vs_metal.mp4
The renderer auto-detects trajectory format (OpenCL and Metal frame structures differ — OpenCL includes boundary particles only in frame 0; Metal includes them in every frame). Hard-code identical camera bounds and distance in both panels so visual differences reflect physics, not viewport.
Phase 7 — Examine Frame-by-Frame for Visual Parity¶
Extract sample frames at multiple time points from the side-by-side MP4:
ffmpeg -y -i docs/<demo>_opencl_vs_metal.mp4 \
-vf "select=between(n\,0\,0)+between(n\,15\,15)+between(n\,100\,100)+between(n\,200\,200)" \
-vsync vfr /tmp/check_frames/sxs_%d.png
Verify at each sample: - Frame 0 — correct initial positions on both sides - Mid-trajectory — particle positions, deformation patterns, flow lines match - Late frames — equilibrium / settling matches - No render bugs (particle missing, mis-labeled, anchor-spring topology wrong)
Quantitative spot-check: at each sample frame, log the y-coordinate (or equivalent scalar metric — wave amplitude, sheet curvature, worm midline) for both sides. Δ < 5% of the OpenCL scale is the parity threshold.
If visually rendered values don't match the trajectory file's numbers, it's a parser bug (re-check the auto-detect logic in the render script). If trajectory numbers diverge between OpenCL and Metal despite SGD convergence, it's a kernel-physics mismatch — return to Phase 4.
Phase 8 — Commit, Document, Publish¶
One commit per benchmark / per substrate port, containing:
- New / modified kernel code + paired backward + FD test
- The benchmark config file (if new)
- The OpenCL reference trajectory
- The Metal trajectory
- The side-by-side MP4 under
docs/ - The SGD convergence history (
/tmp/sgd_<demo>_history.jsoncopied to a tracked location) - Spot-check frame images (at minimum 3-4 sampled timepoints) showing labeled y-values
- A commit message summarizing: parity at sample frames, tuned parameters, substrate work landed
Cross-link the PR to the relevant dd001 GitHub label and the demo's parity artifact in docs/.
Worked Example: one_sprig_test¶
Scenario: Single elastic particle suspended by a vertical anchor spring; no other forces. Tests anchor-spring kernel correctness in isolation.
| Phase | Activity | Result |
|---|---|---|
| 1. Predict | Config: anchor at (x, 32.565, z), elastic at (x, 16.91, z), rest_length = 15.655. Equilibrium y = 16.91. K = elasticityCoefficient × 0.25 × sim_scale = 307,500 (Metal units). Predicted T ≈ 0.36 ms. | Written prediction filed |
| 2. Reference | Ran OpenCL gold standard, saved /tmp/one_sprig_opencl_position.txt |
Reference trajectory captured |
| 3. Inspect & refine | Zero-crossing analysis: actual period = 0.725 ms (1379 Hz); half-amplitude = 1.694 units. Surprise: period is 2× expected — likely the 0.25 factor. | Targets updated: T=0.725 ms, A=1.694 units |
| 4. Implement Metal | Built Metal substrate, ran dump_metal_trajectory.py --anchor-k 555. Initial: y oscillates at 0.36 ms (2× too fast). |
Kernel works; needs parameter tuning |
| 5. SGD-tune | sgd_one_sprig.py with loss L = (period_err/0.725)² + 0.25·(amplitude_err/1.694)². Init K=300, lr=0.5 log-space. Converged in 7 iterations to K=555. |
History saved; T, A match within <1% |
| 6. Render | render_one_sprig.py for both panels; ffmpeg hstack to docs/one_sprig_opencl_vs_metal.mp4 |
Side-by-side MP4 produced |
| 7. Frame check | Samples at t=0, 10, 50, 100 ms: y = 18.606 ± 0.001 across both panels; x, z constant; spring topology correct | Visual parity confirmed |
| 8. Commit | Squashed commit containing: kernel cleanup (M11), SGD history, MP4, frame samples, commit-message parity table | PR ready for review |
Mind-of-a-Worm PR Review Checklist¶
MoaW (and any human reviewer) must verify each item before approving any PR that touches Sibernetic physics:
| # | Check | Pass condition |
|---|---|---|
| 1 | Written prediction filed | PR includes a comment, commit-message section, or doc file showing the expected behavior (period, amplitude, equilibrium, settling time, etc.) derived from the config before the implementation was written. |
| 2 | OpenCL reference trajectory committed | An <demo>_opencl_position.txt (or equivalent) artifact exists in the repo or PR diff, with metadata identifying which OpenCL build produced it. |
| 3 | Metal/CUDA trajectory committed | An equivalent Metal/CUDA trajectory dump exists, same scenario, same duration. |
| 4 | Side-by-side comparison MP4 in docs/ |
docs/<demo>_opencl_vs_metal.mp4 (or _vs_cuda) exists, OpenCL on left, native on right, identical camera. |
| 5 | Visual parity verified at sample frames | PR includes at least 3-4 extracted frame images at distinct time points, with labeled y-values (or scalar metric) showing Δ < 5% of OpenCL scale. |
| 6 | SGD convergence history | If any parameter was tuned, the PR includes the SGD harness output (loss-per-iteration JSON or similar) demonstrating systematic optimization, not hand-sweeping. |
| 7 | Paired backward kernel + FD test | Per Quality Criteria #7, any new forward kernel has a paired analytic backward and a test_*.py FD validator passing at <5% rel-err. |
| 8 | Tuned parameter values documented | Final values for rho_rest, spring_k, visc_pair_coef, alpha_dens, etc., recorded in the commit message and/or the demo's README. |
| 9 | Substrate work itemized | The commit message lists what substrate-level changes landed (new kernels, harness extensions, validator tests, render-script updates). |
| 10 | Phase 1 prediction reconciled against Phase 3 measurement | If the OpenCL measurement surprised the predictor, the PR documents the gap and the root cause (e.g., "predicted 2× faster because forgot the 0.25 factor"). This is a learning artifact for future contributors. |
A PR missing items 1, 4, 5, or 6 is not ready — it's hand-fitted or unverified. A PR missing items 7-10 is incomplete reporting (can be fixed in-PR; not a block). A PR missing items 2 or 3 is unreviewable — the reference and the candidate must both be present for parity to be checked.
Common Gotchas (Distilled from the Native-Metal Port)¶
These are the recurring failure modes during the port that MoaW should flag if present in new PRs:
- 0.25 factor on non-worm-body elastic springs.
sphFluid.clapplieselasticityCoefficient × 0.25only to non-worm-body elastic pairs. Forgetting this in Metal makes the oscillation period 2× too short. The Phase 1 prediction must account for it. rho_rest = 0causes NaN. The density solver divides byrho_rest. Always use1.0for vacuum scenarios where SPH coupling is irrelevant; use the config'srho0explicitly for liquid scenarios.- CLI argv off-by-one cascades.
dump_metal_trajectory.pyuses positional arguments. Skipping a slot doesn't skip the C++ side's argv index — you must emit placeholder'0'strings. When adding a new flag, verify subsequent indices are still aligned. - Hand-sweeping parameters instead of SGD. If a PR runs 3-5 variants of
dump_metal_trajectory.pywith guessed values, stop and use the harness. SGD converges in 5-10 iterations; hand-sweeping takes 20+ and produces less defensible results. - Physics gap vs parameter gap. If SGD plateaus on a loss that's clearly worse than OpenCL (worm sinking with no buoyancy, water exploding with no PCISPH), the issue is a missing kernel, not bad tuning. Port the missing OpenCL kernel — do not ship with caveats.
- Bounded parameters in log-space blow up. Linear parameters like
floor_ycan't use log-space updates (floor_y = 2 × exp(0.5 × 10) = 296). Use--linear-trainablewith--linear-lr 0.05-0.15and bounded step sizes. - Oscillation around optimum on linear params. A limit cycle (parameter bouncing between two values every iteration) means the step size is too coarse. The harness auto-halves on sign-flip + L-increase; if you still see it, drop
--linear-lrfurther. - Frame-format mismatch between backends. OpenCL and Metal trajectories differ in frame structure (boundary particles in frame 0 vs every frame). The render script auto-detects by divisibility check. Symptoms of a broken parser: elastic particle visually missing, or labeled with a boundary-particle y-value.
- Frame count mismatch desyncs the side-by-side. Use
--max-frameson the longer trajectory to cap it to the shorter one, or both panels will end at different simulation times. - Convergence threshold too tight. Chasing
L < 1e-6wastes finite-difference evaluations.L < 1e-4is tight enough for frame-by-frame visual parity; stop there.
Validation Tooling Reference¶
Operational scripts that implement this workflow (all in src/metal_diff/):
dump_metal_trajectory.py— runs a scenario on the Metal substrate; emits trajectory text file matching the OpenCL output formatrender_one_sprig.py— renders a single trajectory as an MP4 panel; auto-detects OpenCL vs Metal frame structurerender_worm.py— same for worm-scale demos with iso/closeup/cross views, zoom controls, edge rendering, liquid opacitysgd_one_sprig.py— single-parameter SGD harness for anchor-spring scenariossgd_true.py— multi-parameter SGD harness using analytic gradients viaxpbd_full_bwd(the canonical reference; TBPTT + gradient clipping)sgd_worm.py— worm-scale parameter tuningsgd_demo2_membrane.py/sgd_demo2_permeability.py— membrane-specific tuning harnessestests/test_demo1_backend_parity.py— automated parity gate for the cube-drop demo (runs both backends, compares trajectories, reports per-metric pass/fail)test_*.pyinsrc/metal_diff/— 19 finite-difference validators for individual kernel backwards
The matching cuda substrate (when it lands per src/cuda/README.md) must include the same workflow tools — dump_cuda_trajectory.py, sgd_* scripts, and FD validators.
Boundaries (Explicitly Out of Scope)¶
This Design Document Does NOT Cover:¶
-
Per-cell mechanical identity beyond muscles: Sibernetic already maps 95 body-wall muscles into 96 independently activable units (24 per quadrant × 4 quadrants: VR, VL, DR, DL), with geometries based on WormAtlas microphotographs (Palyanov et al. 2018, Section 2b). However, non-muscle cells (hypodermis, seam cells, neurons) are still represented as bulk elastic/liquid without cell boundaries. See DD004 (Mechanical Cell Identity) for the proposal to extend per-particle cell IDs to all tissue types.
-
Cuticle fine structure: The cuticle has three layers (basal, medial, cortical) with distinct mechanical properties. Current model uses homogeneous elastic particles. Phase 4 work.
-
Environmental complexity beyond liquid/gel: Sibernetic already supports both liquid and agar gel environments — gel is modeled as elastic matter cubes in a 3D grid (Palyanov et al. 2018, Section 2c). Realistic soil mechanics, bacterial food, and geometric obstacles remain out of scope.
-
Thermodynamics: No temperature, no heat diffusion, no thermal expansion. C. elegans is studied at 20°C but temperature effects are not modeled.
-
Growth and molting: Body size changes during development. Current model assumes fixed adult size. Phase 6 work.
Context & Background¶
C. elegans is a soft-bodied organism. Locomotion emerges from the interaction between:
- Muscle contractile forces (internal)
- Body wall elasticity (structural)
- Pseudocoelomic fluid pressure (hydrostatic skeleton)
- Environmental resistance (viscous drag, surface contact)
Classical rigid-body physics engines (used in robotics) cannot capture this fluid-structure interaction. Finite element methods (FEM) are accurate but computationally expensive and poorly suited for large deformations and topological changes.
Implementation References¶
Repository¶
https://github.com/openworm/Sibernetic
Key files:
src/owPhysicsFluidSimulator.cpp— Main SPH loop (OpenCL reference)src/owWorldSimulation.cpp— Particle initialization, boundary conditionssrc/sphFluid.cl— OpenCL kernel for GPU acceleration (the canonical reference; ~64KB)src/metal_diff/— Native Metal substrate (Apple Silicon):sib_metalbinary,shaders.metal, per-op test suitesrc/cuda/— Native CUDA substrate (NVIDIA): scaffolding awaiting PR #229taichi_backend/— Historical Taichi prototyping path (superseded by the native substrates above)
Configuration Format¶
Sibernetic reads .ini files specifying:
- Particle counts (liquid, elastic, boundary)
- Initial positions (from pre-generated
.obj3D mesh or procedural) - Muscle activation input file
- Simulation duration, timestep, output frequency
Example:
[simulation]
timestep = 0.00002
duration = 10.0
output_interval = 0.01
[particles]
liquid_count = 50000
elastic_count = 40000
boundary_count = 10000
[physics]
rest_density = 1000.0
viscosity = 4e-6
elasticity = 0.0006
Compute Backends¶
| Backend | Language | Hardware | Speed | Differentiable | Result Quality | Status |
|---|---|---|---|---|---|---|
| OpenCL | C++ | CPU/GPU (Linux, Intel Mac) | Baseline | ❌ forward only | Gold standard — validated ±15% | Production (but losing driver support) |
| Native Metal | C++/Metal shaders | Apple Silicon GPU | First-pass ~2.7 ms/step on M-series | ✅ end-to-end — 19 paired backward kernels, multi-step xpbd_full_bwd, FD-validated, 4 demos SGD-tuned |
Forward parity in progress; 5+ demos working (cube drop, membrane, worm_alone, worm_swim) | Experimental → Stable — consolidated on ow-native-gpu-0.1.0 branch; PR #230 in flight |
| Native CUDA | C++/CUDA | NVIDIA GPU | Target ~5x OpenCL | 🟡 structural mandate — src/cuda/README.md requires mirroring Metal's paired-backward architecture |
Scaffolding stage | Scaffolding — PR #229 (sib_cuda) in review |
| PyTorch | Python | CPU | Slow | (autodiff possible but not on critical path) | Does not yet match OpenCL | Stable (doesn't crash; 76+ tests; useful as correctness reference) |
| Taichi Metal / CUDA | Python/Taichi | Apple Silicon / NVIDIA | n/a | n/a | n/a | Superseded — earlier prototyping path; supplanted by the native ports above |
Recommendation: OpenCL remains the only backend producing validated simulation results today. However, OpenCL driver support is shrinking across platforms (Apple removed OpenCL on Apple Silicon; AMD/NVIDIA deprioritizing). The path forward is native Metal and CUDA substrates — hand-written GPU kernels that target each platform's first-class API directly, rather than Python/Taichi abstractions that introduced their own divergence from the OpenCL reference. See Backend Stabilization Roadmap below.
Backend Stabilization Roadmap¶
Why OpenCL Must Be Replaced¶
The OpenCL backend is the only backend producing validated simulation results (kinematic metrics within ±15% of Schafer lab baseline). However, OpenCL driver support is shrinking across all major platforms:
- Apple dropped OpenCL entirely (deprecated since macOS 10.14, removed from Apple Silicon native support)
- AMD is deprioritizing OpenCL in favor of ROCm/HIP
- NVIDIA is deprioritizing OpenCL in favor of CUDA
- Docker — OpenCL runs CPU-only in Docker (GPU passthrough broken, issue #320)
The core challenge: the only backend producing good simulation results is losing the platforms it runs on. The goal is to achieve OpenCL-equivalent result quality on native Metal (Apple Silicon) and native CUDA (NVIDIA) substrates before OpenCL becomes unusable.
Why Native Ports, Not Taichi¶
An earlier prototyping path used Taichi as an intermediate language to target Metal and CUDA from a single Python codebase. Taichi simplified bring-up but introduced a coordinate-space divergence from the OpenCL reference (elastic forces ~287× too weak — see historical note in Cross-Backend Parity Requirements below) that proved harder to debug than implementing the GPU kernels natively.
The current strategy is hand-written kernels per platform: a Metal shader implementation for Apple Silicon (src/metal_diff/), a CUDA kernel implementation for NVIDIA (src/cuda/). Each substrate mirrors the OpenCL sphFluid.cl reference line-by-line, giving direct algorithmic correspondence without an intermediate abstraction layer. This is the strategy consolidated on the ow-native-gpu-0.1.0 branch (see PR #230) and is the active direction for the project.
The Result Quality Gap¶
The native Metal substrate is the most mature of the modernization paths and reaches forward parity on five demos so far:
- demo1 (cube drop) — forward parity within ±5%; cross-backend parity test passes 5/5 metrics
- demo2 (membrane permeability) — membrane mechanism ported (M10 kernels) with FD-validated backward; sheet-scale parameter tuning in progress
- one_sprig_test — active muscle contraction extension to anchor-spring kernel; OpenCL reference + Metal port both validated
- worm_alone_half_resolution — gravity drop + cylindrical-diameter retention matched to OpenCL
- worm_swim_half_resolution — basic swim locomotion validated; SPH pressure force kernel implemented (replaces XPBD-only density solver) for accurate water-on-worm dynamics
Cross-backend parity status:
| Backend | Demo1 | Demo2 | one_sprig | worm_alone | worm_swim |
|---|---|---|---|---|---|
| OpenCL | reference | reference | reference | reference | reference |
| Native Metal | ✅ ±5% | 🟡 in tuning | ✅ ±5% | ✅ visual parity | 🟡 swim parity in progress |
| Native CUDA | scaffolding | scaffolding | scaffolding | scaffolding | scaffolding |
| PyTorch | results don't yet match | — | — | — | — |
Root cause analysis approach: The OpenCL C++ kernels (sphFluid.cl, ~64KB) are the reference implementation. Each native substrate (Metal, CUDA) implements the kernels per-platform with direct OpenCL-line correspondence rather than via an intermediate abstraction. The historical Taichi divergence (elastic forces ~287× too weak from a coordinate-space mismatch — sim_scale division applied inconsistently between SPH and elastic force paths) is not present in the native ports because the OpenCL reference's coordinate handling is preserved verbatim.
Cross-Backend Parity Requirements¶
Replacement backends must produce kinematic outputs matching OpenCL within ±5% on the same configuration. The parity test suite consists of:
| Test | What It Measures | Pass Criterion |
|---|---|---|
| Drop test | Liquid sphere under gravity settles and spreads | Position/velocity metrics within ±5% of OpenCL |
| Elastic deformation | Suspended elastic body sags under gravity | Mean displacement within ±5% of OpenCL |
| Muscle contraction | Single quadrant activation bends body | Curvature within ±5% of OpenCL |
| Full worm crawling | 15ms coupled simulation | WCON trajectory metrics within ±5% of OpenCL |
Each test produces numeric metrics; the parity suite compares against OpenCL baseline values stored in a reference file.
Backend Graduation Criteria¶
| Level | Requirements |
|---|---|
| Experimental | Compiles, runs, produces output without crashing |
| Stable | Runs 10s without divergence, passes parity tests within ±5% of OpenCL |
| Production | Stable + integrated in Docker + CI-gated + performance benchmarked |
Current status:
| Backend | Level | Blocking Issue |
|---|---|---|
| OpenCL | Production | Losing platform support |
| Native Metal | Experimental → Stable | Forward parity demonstrated on 5+ demos; demo2 sheet-scale tuning + worm_swim parity in progress. Differentiable end-to-end: 19 paired backward kernels (FD-validated), multi-step xpbd_full_bwd, 4 demos already SGD-tuned to OpenCL reference. See Differentiability section. |
| Native CUDA | Scaffolding | Awaiting kernel implementation (PR #229 introduces the substrate skeleton) |
| PyTorch | Experimental | Stable but results don't match OpenCL; positioned as a CPU correctness reference |
| Taichi Metal / CUDA | Superseded | Earlier prototyping path; the native ports above are the replacement direction |
Stabilization Sequence¶
- Create validation scripts:
scripts/check_stability.pyandscripts/validate_incompressibility.py - Create cross-backend parity test suite:
scripts/backend_parity_test.py— compare against OpenCL baseline - Land the native Metal substrate consolidation: merge PR #230 (
ow-native-gpu-0.1.0 → 0.9.9) — this represents the strategic shift from Taichi prototyping to hand-written platform-native kernels - Complete OpenCL ↔ Native Metal parity per demo: demo1 ✅, demo2 (in tuning), worm_alone ✅, worm_swim (in progress), one_sprig ✅. Each demo gets a parity gate in CI.
- Land the native CUDA substrate: review and merge PR #229; bring CUDA up to demo1 parity
- Run parity tests: Graduate backends that pass to Stable
- Add to Dockerfile and CI: Add graduated backends to Docker image, add backend-specific CI gates per platform (macOS for Metal, Linux+NVIDIA for CUDA)
- Performance benchmark: Update recommendation for which backend to use on which platform
Community Contributions to Modernization¶
Substantial portions of the native-port modernization have been driven by community contributions. Specific contributor acknowledgments are tracked in GitHub issues on openworm/sibernetic rather than in this design document — see the contributor-acknowledgment label for the current list. The framing of "OpenCL is losing platform support, native Metal + CUDA is the modernization path" emerged directly from community-filed issues such as #226 (Port sphFluid.cl to Metal for ARM64 Mac).
Differentiability¶
The native Metal substrate is architecturally differentiable. This is not a bolt-on or a future framework: every physical kernel was implemented from day one with a paired analytic backward kernel, and the multi-step integration loop walks K iterations in reverse to produce gradients on initial conditions and physical parameters. The same architecture is the design requirement for the in-progress CUDA port.
Why Differentiability Is a Substrate Property, Not a Framework Layer¶
The native substrate derives analytic backwards alongside each forward kernel rather than relying on an autodiff layer over a separate reimplementation. This produces three structural advantages:
- No re-implementation drift. A separate autodiff reimplementation in PyTorch or JAX would have to track the reference's behavior over time as physics changes; an in-substrate backward is the same code path as the forward.
- GPU-native gradients. Backward kernels run on the same Metal command queue as the forwards — no host round-trip, no autodiff graph overhead.
- Parameter gradients arrive for free. Once each forward kernel has an analytic backward, every physical parameter (stiffness, viscosity, rest density, compliance) is differentiable end-to-end without explicit
requires_gradplumbing.
The CUDA scaffolding at src/cuda/README.md makes this explicit: the CUDA substrate must mirror src/metal_diff/ file-for-file, including paired backward kernels per forward kernel. Differentiability is the architectural contract the substrate exposes — not an optional feature flag.
The ML-augmentation framework that consumes this substrate (surrogate models, learned sensory transduction) is the scope of DD013.
What's Differentiable Today¶
| Forward kernel | Backward kernel | Status | FD validation |
|---|---|---|---|
dist_active_static / _active_active (pairwise distance) |
*_backward |
Implemented | <1% rel-err |
wpoly6_inplace (Müller poly6 SPH kernel) |
wpoly6_inplace_backward |
Implemented | <1% rel-err |
rowsum_density (SPH density via threadgroup reduction) |
rowsum_density_backward |
Implemented | <1% rel-err |
density_grad_combined (fused density + ∇W + denom) |
density_constraint_grad_backward |
Implemented | <5% rel-err |
solve_density_constraint (XPBD density correction) |
solve_density_constraint_backward |
Implemented | <5% rel-err |
predict_positions (XPBD predictive step) |
predict_positions_backward |
Implemented | end-to-end via xpbd_full_bwd |
solve_distance_constraints_seq (iterative constraint projection) |
*_backward |
Implemented | end-to-end via xpbd_full_bwd |
pair_forces_grid (viscosity + surface tension) |
pair_forces_grid_backward |
Implemented | <1% rel-err |
apply_ext_accel (external acceleration) |
apply_ext_accel_backward |
Implemented | end-to-end via xpbd_full_bwd |
spring_bonds_force (Hooke spring bonds) |
spring_bonds_force_backward |
Implemented | <1% rel-err |
spring_anchor_force (boundary anchor springs) |
spring_anchor_force_backward |
Implemented | end-to-end via xpbd_full_bwd |
solve_floor_constraint (floor collision + restitution) |
solve_floor_constraint_backward |
Implemented | end-to-end via xpbd_full_bwd |
update_velocities (verlet velocity update) |
update_velocities_backward |
Implemented | end-to-end via xpbd_full_bwd |
membrane_clear / _accumulate / _apply (Ihmsen 2014 M10) |
analytic backward through plane projection | Implemented | <7e-4 rel-err (test_xpbd_full_membrane.py, K ∈ {1,2,3,5}) |
19 forward kernels, 19 paired backward kernels. All analytic, hand-derived from the physics. All validated against finite-difference at ε = 1e-3 to within 1–5% relative error (membrane: <7e-4).
Multi-Step Reverse-Mode AD: xpbd_full_fwd / xpbd_full_bwd¶
The substrate exposes an end-to-end forward + backward pair (ops_xpbd_full.mm):
run_xpbd_full_fwd(argc, argv)— runs K XPBD constraint-projection steps, persisting per-step state (positions, velocities, density, ∇C, denominator helpers, per-kernel auxiliaries) to disk for the backward walk.run_xpbd_full_bwd(argc, argv)— consumes the saved state + a ∂L/∂x_final seed gradient, walks K steps in reverse, and outputs:∂L/∂x_init,∂L/∂v_init(initial-condition gradients)∂L/∂ρ_rest,∂L/∂spring_K,∂L/∂visc_coef,∂L/∂α_density,∂L/∂floor_y,∂L/∂restitution(scalar parameter gradients)- Per-particle parameter gradients where applicable
Per-step gradient clipping (env: BWD_CLIP_NORM) keeps long sequences numerically stable; TBPTT support lets training scripts trade memory for trajectory length.
Gradient-Tuned Demos (SGD Already Applied)¶
Four of the five working demos have been tuned via gradient descent against OpenCL reference trajectories:
| Demo | Parameters tuned | SGD script | Status |
|---|---|---|---|
| demo1 (cube drop) | ρ_rest, spring_K, visc_coef |
sgd_true.py (TBPTT + grad clipping) |
tuned to OpenCL parity (commit bf6b333 series) |
| demo2 (membrane permeability) | membrane elasticity, permeability | sgd_demo2_membrane.py, sgd_demo2_permeability.py |
tuned to OpenCL parity |
| one_sprig_test | spring_K (single anchor spring) |
sgd_one_sprig.py |
reference oscillator validated |
| worm_alone_half_resolution | spring_K, visc on worm elastic particles |
sgd_worm.py |
tuned to OpenCL trajectory |
| worm_swim_half_resolution | (in progress) | — | forward parity demonstrated; SGD pending |
SGD harness lives at src/metal_diff/sgd_*.py (8 scripts total, see repo). The pattern is reusable for any new demo or parameter sweep.
Implications for Downstream Subsystems¶
Because the substrate exposes a differentiable contract, downstream subsystems that couple to body physics can now ask gradient-based questions of it:
- Neural ↔ body coupling (DD002, DD003) — muscle activation timing, calcium-to-force scaling, and per-muscle-unit strength can be jointly optimized against kinematic targets. The muscle force injection path (Sibernetic's modulation of elastic bond stiffness via
k_muscle(t) = k_baseline × (1 + activation(t) × strength_multiplier)) is differentiable throughxpbd_full_bwd; activation traces can be backpropagated to neural circuit parameters when those are also expressed in a differentiable form. - Validation framework (DD010) — kinematic-metric mismatches against Schafer-lab baselines (speed, wavelength, frequency, gait) can directly drive parameter updates rather than requiring manual sweeps. Tier 3 validation becomes a loss function the substrate can be optimized against.
- Simulation stack (DD011) — the
body.backend: metal-nativeconfiguration should expose a differentiable interface alongside the forward-only interface, so other subsystems can opt into gradient-based parameter fitting viaopenworm.yml. - Hybrid ML framework (DD013) — now narrows to its remaining scope: neural surrogates for SPH (1000× speedup target) and learned sensory transduction. The "differentiable simulation backend" component of DD013 is delivered here, in DD001.
What's Not (Yet) Differentiable¶
- OpenCL backend — the reference implementation remains forward-only. Validated as the kinematic ground truth; not a differentiation target.
- CUDA substrate — scaffolding stage; PR #229 introduces the substrate skeleton. The architectural contract (paired backward per forward) is mandated by the CUDA README but not yet implemented.
- PyTorch backend — forward-only CPU reference; could in principle be made differentiable via autodiff, but is not on the critical path (the native substrates are the production target).
How to Use the Differentiable Interface¶
# Forward pass: run K steps, save state
./sib_metal xpbd_full_fwd <args> --save-state /tmp/sim_state.bin
# Backward pass: seed gradient + retrieve parameter gradients
./sib_metal xpbd_full_bwd /tmp/sim_state.bin /tmp/grad_seed.bin \
--out-grads /tmp/param_grads.bin
For a worked example with SGD harness, see src/metal_diff/sgd_true.py (the canonical reference implementation that tunes (ρ_rest, spring_K, visc_coef, α_density, floor_y) against an OpenCL target trajectory using TBPTT + gradient clipping).
References¶
-
Müller M, Charypar D, Gross M (2003). "Particle-based fluid simulation for interactive applications." Proc. ACM SIGGRAPH/Eurographics Symp. Computer Animation, pp. 154-159. SPH kernel functions.
-
Solenthaler B, Pajarola R (2009). "Predictive-Corrective Incompressible SPH." ACM Trans. Graphics 28(3):40. PCISPH pressure solver.
-
Boyle JH, Cohen N (2008). "Caenorhabditis elegans body wall muscles are simple actuators." Biosystems 94:170-181. Muscle-to-physics coupling validation.
-
Palyanov A, Khayrulin S, Larson SD (2018). "Three-dimensional simulation of the Caenorhabditis elegans body and muscle cells in liquid and gel environments for behavioural analysis." Phil. Trans. R. Soc. B 373:20170376. Primary Sibernetic publication. Describes muscle cell mapping (96 units), validated kinematics (Table 1), force–velocity/force–length relationships, omega turns, and agar gel simulation.
-
Zhao M et al. (2024). Nat Comp Sci 4:978-990. Demonstrated real-time FEM body simulation of C. elegans at 30 FPS.
-
Bouaziz S et al. (2014). ACM Trans Graphics 33:154. Projective Dynamics FEM solver.
Integration Contract¶
Inputs / Outputs¶
Inputs (What This Subsystem Consumes)
| Input | Source DD | Variable | Format | Units | Timestep |
|---|---|---|---|---|---|
| Muscle activation coefficients | DD003 (via sibernetic_c302.py) |
Per-muscle activation [0, 1] | Written to muscle activation file by coupling script | dimensionless | dt_coupling (0.005 ms from neural side) |
| Particle initialization geometry | DD004 (when cell_identity enabled) | Per-particle position, type, cell_id | Binary or CSV particle file | µm (positions), enum (type) | One-time at sim start |
Outputs (What This Subsystem Produces)
| Output | Consumer DD | Variable | Format | Units |
|---|---|---|---|---|
| Particle position time series | DD010 (Tier 3 validation) | All particle positions per output frame | Binary state dump or WCON trajectory | µm |
| Rendered frames / video | DD011 (output pipeline) | Visual frames of worm body | PNG or direct framebuffer | pixels |
| Body deformation state | DD007 (pharynx, if Option B) | Anterior attachment point displacement | Shared memory or file | µm |
| Particle positions (for viewer) | DD012 (visualization) | Per-particle (x, y, z) over all output timesteps | OME-Zarr: body/positions/, shape (n_timesteps, n_particles, 3) |
µm |
| Particle types (for viewer) | DD012 (visualization) | Per-particle type (liquid/elastic/boundary) | OME-Zarr: body/types/, shape (n_particles,) |
enum |
| Surface mesh (for viewer) | DD012 (visualization) | Reconstructed smooth body surface per timestep | OME-Zarr: geometry/body_surface/ (per-frame OBJ or vertices+faces arrays) |
µm |
| Parameter gradients (native substrates only) | DD002 (joint neural↔body fitting), DD003 (muscle parameter fitting), DD010 (gradient-based validation loop) | ∂L/∂(x_init, v_init, ρ_rest, spring_K, viscosity, α_density, floor_y, restitution) via xpbd_full_bwd |
Binary float32 buffers per parameter | mixed (per-particle position grads, scalar parameter grads) |
Repository & Packaging¶
| Item | Value |
|---|---|
| Repository | openworm/sibernetic |
| Docker stage | body in multi-stage Dockerfile |
versions.lock key |
sibernetic |
| Build dependencies | OpenCL SDK (AMD or Intel), CMake, C++ compiler |
| Backend note | Taichi backends require pip install taichi (not currently in Dockerfile) |
Configuration¶
body:
enabled: true
engine: sibernetic
backend: opencl # opencl, metal-native, cuda-native, pytorch
configuration: "worm_crawl_half_resolution"
particle_count: 100000
cell_identity: muscle # "muscle" = 96 muscle units mapped (default). "all" = Phase 4 (DD004): extend to all tissue types. "false" = bulk elastic only.
timestep: 0.00002 # seconds
| Key | Default | Valid Range | Description |
|---|---|---|---|
body.enabled |
true |
true/false |
Enable body physics simulation |
body.engine |
sibernetic |
sibernetic |
Physics engine selection |
body.backend |
opencl |
opencl, metal-native, cuda-native, pytorch |
Compute backend. metal-native targets Apple Silicon via hand-written Metal shaders; cuda-native targets NVIDIA via hand-written CUDA kernels. The earlier taichi-metal / taichi-cuda options are superseded by the native ports. |
body.configuration |
"worm_crawl_half_resolution" |
String | Simulation configuration name |
body.particle_count |
100000 |
Integer | Total particle count |
body.cell_identity |
muscle |
false/muscle/all |
muscle = 96 muscle units mapped (existing). all = extend to all tissue types (DD004, Phase 4). false = bulk elastic only. |
body.timestep |
0.00002 |
Float (seconds) | Simulation timestep |
How to Test (Contributor Workflow)¶
# Per-PR quick test (must pass before submission)
docker compose run quick-test
# Checks: simulation completes without NaN, segfault, or SIGKILL
# Checks: output/*.wcon exists
# Full validation (must pass before merge to main)
docker compose run validate
# Checks:
# - Density deviation < 1% for liquid particles
# - Tier 3 kinematic metrics within ±15% of baseline
# - No particle escape (all positions within bounding box)
Per-PR checklist:
- [ ] Build succeeds (
cmake .. && make -j8) - [ ]
quick-testpasses (no NaN, no segfault, *.wcon exists) - [ ]
validatepasses (Tier 3 kinematics + density constraint) - [ ] Tested on at least two backends if core SPH algorithms changed
- [ ] No particle escape (all positions within bounding box)
How to Visualize (DD012 Connection)¶
| OME-Zarr Group | Viewer Layer | Color Mapping |
|---|---|---|
body/positions/ (n_timesteps, n_particles, 3) |
Body particles | Liquid=blue, elastic=green, boundary=gray |
body/types/ (n_particles,) |
Particle type overlay | Type enum → color mapping |
geometry/body_surface/ (per-frame) |
Reconstructed mesh | Smooth surface rendering |
Backend-Config Translation¶
Sibernetic internally reads .ini configuration files. The master_openworm.py orchestrator (DD011) is responsible for translating openworm.yml body section to Sibernetic .ini format at runtime:
# master_openworm.py (pseudocode)
def write_sibernetic_config(openworm_config):
ini = configparser.ConfigParser()
ini['simulation']['timestep'] = str(openworm_config['body']['timestep'])
ini['particles']['total'] = str(openworm_config['body']['particle_count'])
# ... etc
ini.write(open('sibernetic_config.ini', 'w'))
Coupling Dependencies¶
| I Depend On | DD | What Breaks If They Change |
|---|---|---|
| Muscle activation format | DD003 | If activation value range, file format, or muscle count changes, Sibernetic reads wrong data |
sibernetic_c302.py script |
DD002/DD003 | This script bridges neural→body; changes to it affect coupling timing |
| Cell boundary data | DD004 | If particle initialization changes (cell-tagged particles), body geometry changes |
| Depends On Me | DD | What Breaks If I Change |
|---|---|---|
| Movement validation | DD010 | If particle output format changes, validation toolbox can't read trajectory data |
| Mechanical cell identity | DD004 | If particle struct changes (adding/removing fields), DD004 initialization must match |
| Pharynx attachment | DD007 | If body geometry changes at anterior, pharynx coupling point shifts |
- Approved by: OpenWorm Steering
- Implementation Status: Complete (OpenCL production but losing platform support; native Metal substrate experimental→stable with 5+ demos working and end-to-end differentiable — 19 paired backward kernels, 4 demos already SGD-tuned; native CUDA substrate in scaffolding with paired-backward architecture mandated; PyTorch experimental as CPU correctness reference. See Backend Stabilization Roadmap and Differentiability)
-
Next Actions:
-
Create stability validation scripts (
scripts/check_stability.py,scripts/validate_incompressibility.py) - Create cross-backend parity test suite against OpenCL baseline (
scripts/backend_parity_test.py) - Land the native-gpu branch consolidation — merge PR #230 (
ow-native-gpu-0.1.0 → 0.9.9) - Complete OpenCL ↔ Native Metal parity on remaining demos (demo2 sheet-scale, worm_swim swim parity)
- Bring native CUDA substrate up to demo1 parity (review and merge PR #229, then iterate) — including paired backward kernels per the architectural contract
- Expose differentiable interface in
openworm.yml(DD011) so downstream subsystems can opt into gradient-based parameter fitting - Joint neural ↔ body parameter fitting prototype (DD002, DD003) — use SGD to tune muscle activation scaling end-to-end against kinematic targets
- Graduate backends that pass parity tests; add to Dockerfile and CI per-platform
- Extend per-particle cell IDs to all tissue types (DD004)
- Add cell-type-specific mechanical properties