OpenWorm Modeling Phase Roadmap¶

Version: 1.0
Created: 2026-02-19
Purpose: Master timeline for Design Document implementation with visible milestones

Mission, Vision, and Principles¶

Mission (from openworm.org)¶

"OpenWorm is an open source project dedicated to creating the world's first virtual organism in a computer, a C. elegans nematode."

Vision¶

"Building the first digital life form. Open source."

Core Principles¶

Physical Realism: "Worms are soft and squishy. So our model has to be too. We are building in the physics of muscles, soft tissues and fluids. Because it matters."
Multi-Scale Integration: From ion channels (angstroms) to neurons (micrometers) to organs (hundreds of micrometers) to organism behavior (millimeters, seconds to minutes).
Experimental Validation: Every model must be validated against real C. elegans data — electrophysiology, calcium imaging, and behavior. We don't build plausible models; we build validated models.
Open Science: All code, data, and Design Documents are open source. No proprietary IP, no paywalls, no secrets.
Causal Interpretability: We can trace why behavior emerges through the mechanistic causal chain. Every parameter has physical meaning. Black-box ML is used only at boundaries (DD017), never replacing the mechanistic core.

How This Roadmap Serves the Mission¶

The Mission says: "world's first virtual organism."
This Roadmap delivers: 959-cell whole-organism simulation (Phase 4) with neurons, muscles, pharynx, intestine, hypodermis, reproductive system — validated against experimental data at three scales (DD010 Tier 1-3).
The Vision says: "digital life form."
This Roadmap delivers: Not just anatomy (static 3D models exist), but dynamic temporal behavior — pumping, defecating, egg-laying, locomoting, responding to touch — emergent from coupled biophysical subsystems.
The Principles say: "soft and squishy... physics matters."
This Roadmap delivers: SPH body physics (DD003), calcium-force muscle coupling (DD002), fluid-structure interaction, mechanical cell identity (DD004). The worm deforms realistically because the physics is realistic.

Overview¶

OpenWorm's path from 302 generic neurons to 959 specialized cells is organized into 8 implementation phases over ~18-24 months. Each phase has:

Clear scope (which DDs belong to it)
Visible milestone (what we can announce when complete)
Timeline estimate (weeks/months)
Success criteria (how we know it's done)
Blocking dependencies (what must be done first)

The North Star: By end of Phase 4, OpenWorm simulates a complete adult hermaphrodite C. elegans with 959 specialized cells, validated against experimental data at multiple scales (electrophysiology, neural recordings, behavior), with an interactive 3D viewer anyone can explore in a web browser.

Phase 0: Core Architecture (Functional, Stabilizing)¶

Status: ✅ Functional — core simulation runs; 83 stabilization issues tracked across DD001-DD003/DD020 (containerization, validation scripts, config system, dependency pinning — most addressed by Phase A1)

Phase Rationale: These DDs describe already-implemented subsystems — the code exists and works. c302 generates NeuroML networks, GenericMuscleCell has Ca²⁺→force coupling, Sibernetic runs ~100K SPH particles, and cect provides 30+ connectome datasets. They form the working foundation everything else builds on.

Scope:

DD	Title	Implementation Status
DD001	Neural Circuit Architecture	✅ c302 Levels A-D exist, generate NeuroML networks
DD002	Muscle Model Architecture	✅ GenericMuscleCell exists, Ca²⁺→force coupling works
DD003	Body Physics Architecture	✅ Sibernetic v1.0+ works (OpenCL only — PyTorch/Taichi do not yet match result quality; see DD003 Backend Stabilization Roadmap)
DD020	Connectome Data Access	✅ `cect` v0.2.7 exists (needs version pinning per DD020)

What Works Today:

302 identical neurons (Level C1 graded synapses)
95 body wall muscles with calcium-force coupling
~100K SPH particles (fluid-structure interaction)
Coupled simulation via sibernetic_c302.py
15ms simulations produce movement (validated against Schafer lab kinematics)
ConnectomeToolbox provides Cook2019, Witvliet, Randi, Ripoll-Sanchez data

What's Missing (addressed in Phase A1):

No config system (parameters hardcoded in master_openworm.py)
No docker-compose (raw shell scripts)
No dependency pinning (branch names, not commits)
No automated validation (Tier 3 toolbox is broken)
Video pipeline has memory leak (OOMs >2s simulations)
PyTorch/Taichi backends don't yet match OpenCL result quality; OpenCL losing platform support (DD003 Backend Stabilization Roadmap)
Fast trajectory screening for validation: Boyle-Cohen 2D model (boyle_berri_cohen_trajectory.py) enables quick-test kinematic validation without full Sibernetic GPU build (see DD001 Issue 1)

Phase 0 → Phase A1 Gap Map:

What's Missing (Phase 0)	Addressed By	Specific Issues
No config system (params hardcoded)	DD013 Issue 1 (openworm.yml schema)	DD013 draft issues §Group 1
No docker-compose	DD013 Issues 2-4 (Dockerfile, compose, CI)	DD013 draft issues §Group 1
No dependency pinning	DD013 Issue 7 (versions.lock)	DD013 draft issues §Group 2
No automated validation	DD021 Issues 1-2 (toolbox revival)	DD021 draft issues §Group 1
Video pipeline memory leak	DD013 Issue 5 (video pipeline fix)	DD013 draft issues §Group 1
Backends don't match OpenCL	DD003 Issues 5-7 + DD013 Issues 39-42	DD003 draft issues §Group 2
Missing validation scripts (10 total)	DD001 Issues 1-3, DD002 Issues 1-2, DD003 Issues 1-3	See table below
cect not pinned/cached in Docker	DD020 Issues 1-5 (build integration)	DD020 draft issues §Group 1
Fast trajectory screening tool	DD001 Issue 1 (boyle_berri_cohen_trajectory.py)	DD001 draft issues §Group 1

Missing Validation Scripts Inventory:

Script	DD	Phase A1 Issue	Target Repo
`boyle_berri_cohen_trajectory.py`	DD001	DD001 Issue 1	openworm/c302
`extract_trajectory.py`	DD001	DD001 Issue 2	openworm/sibernetic
`compare_kinematics.py`	DD001	Moved to DD021 Issue 1	openworm/open-worm-analysis-toolbox
`check_regression.py`	DD001	Moved to DD021 Issue 2	openworm/open-worm-analysis-toolbox
`plot_muscle_activation.py`	DD002	DD002 Issue 1	openworm/c302
`validate_muscle_calcium.py`	DD002	DD002 Issue 2	openworm/c302
`check_stability.py`	DD003	DD003 Issue 1	openworm/sibernetic
`validate_incompressibility.py`	DD003	DD003 Issue 2	openworm/sibernetic
`backend_parity_test.py`	DD003	DD003 Issue 5	openworm/sibernetic

Issue Inventory: 83 total issues across 4 Phase 0 DDs (DD001: 9, DD002: 18, DD003: 21, DD020: 23) plus relocated issues (DD005: 6, DD017: 3, DD027: 3). Of these, ~35 are Phase A1 infrastructure work, ~25 are Phase 1+, ~23 can be addressed at any phase. See individual DD draft issue files for details.

Milestone: (Already achieved) "First Whole-Nervous-System Simulation"

What you run: python master_openworm.py — coupled c302 + Sibernetic simulation, 15ms of worm locomotion
What you see: ~100K SPH particles forming a worm shape that bends and propagates undulatory waves. Voltage traces for 302 neurons. Muscle activation patterns.
Validated against: Yemini et al. 2013 Schafer lab kinematic features — locomotion speed, body curvature, wave frequency within ±15% of wild-type N2
Published: Sarma et al. 2018, Gleeson et al. 2018

Phase A1: Core Infrastructure (Weeks 1-2)¶

Status: ⚠️ Proposed — Must complete before modeling phases proceed

Phase Rationale: These five DDs have the highest downstream dependency count. DD013 is the orchestrator — literally nothing runs without it. DD008 provides the unified data layer that Phase 1 DDs (DD005, DD006) query. DD021 enables Tier 3 behavioral validation. DD024 provides the ground truth data. DD028 gives CI/validation visibility — can't manage what you can't measure. Together they answer: can a contributor build the stack, access data, and validate results?

Scope:

DD	Title	Owner	Effort	Blocking
DD013	Simulation Stack Architecture	Integration L4 (TBD)	~40 hours	CRITICAL — All phases need this
DD008	Data Integration Pipeline (OWMeta)	Data L4 (TBD)	~30 hours	Data layer — Phase 1+ datasets need unified access
DD021	Movement Toolbox Revival	Validation L4 (TBD)	~33 hours	Tier 3 validation
DD024	Validation Data Acquisition Pipeline	Validation L4 (TBD)	~18 hours	All validation tiers — data must exist before validation can run
DD028	Project Metrics Dashboard	Integration L4 (TBD)	~12 hours	CI/validation visibility

Key Deliverables:

openworm.yml config schema (DD013) — Single source of truth for simulation parameters
Multi-stage Docker build (DD013) — Subsystem caching, contributor override (--build-arg)
docker-compose.yml (DD013) — quick-test, simulation, validate, viewer, shell services
versions.lock (DD013) — Pin exact commits for c302, Sibernetic, cect, toolbox
Revived analysis toolbox (DD021) — Python 3.12 compatible, 5 metrics extractable, WCON parser works
Contributor workflow (DD013) — Fork subsystem → build with custom branch → quick-test → validate → PR
OWMeta data bundles (DD008) — Unified Python API for connectome, CeNGEN, cell positions, neuropeptide data; WBbt ID normalization across all datasets
Project metrics dashboard (DD028) — Static-site dashboard (GitHub Pages) with 4 panels: validation scores (Tier 2/3 trend), contributor activity (BadgeList + GitHub), CI health (pass/fail across repos), phase progress (DD status parsing). Auto-updates on every CI run via GitHub Actions.
Validation data repository (DD024) — openworm/validation-data repo with manifest.json mapping datasets to DDs and validation tiers, verify_validation_data.py script, digitized baseline datasets (Schafer kinematics WCON, Randi 2023 correlation matrix)

Milestone: 🎉 "Containerized Stack with Automated Validation" (Target: Week 2, mid March 2026)

What you run: docker compose run quick-test (completes in <5 min) — builds the full simulation stack, runs a short simulation, checks for crashes
What you see: Terminal output showing build → simulate → validate pipeline. JSON report with pass/fail on each metric. Video of worm locomotion (no more OOM at >2s).
Then try: docker compose run validate — runs Tier 2 (functional connectivity vs. Randi 2023) + Tier 3 (kinematics vs. Yemini 2013). Produces output/validation_report.json with per-metric scores.
Contributor workflow: Fork a subsystem → docker compose run quick-test --build-arg C302_BRANCH=my-branch → see if your change breaks anything → PR with CI green

Success Criteria:

✅ docker compose run quick-test completes in <5 minutes
✅ docker compose run validate runs Tier 2 functional connectivity + Tier 3 kinematics, produces JSON report
✅ versions.lock pins c302, Sibernetic, cect, toolbox to exact commits
✅ Video pipeline memory leak fixed (can run >2s without OOM)
✅ Analysis toolbox installs on Python 3.12, extracts 5 metrics from sample WCON file
✅ OWMeta installs, connect("openworm_data") returns 302 neurons with WBbt IDs (DD008)
✅ openworm/validation-data repo exists, manifest.json covers Tier 2 + Tier 3 datasets, verify_validation_data.py passes (DD024)
✅ Dashboard deployed to GitHub Pages, all 4 panels render, data auto-updates on CI run (DD028)

Datasets Needed: See DD024 for the complete inventory. Key Phase A1 datasets:

Schafer lab N2 baseline kinematics (WCON format) — Tier 3 validation baseline
Randi 2023 functional connectivity (302×302 correlation matrix) — Tier 2 validation
Validation data for digitization — Thomas 1990, Raizen 1994, O'Hagan 2005, Chalfie 1985

Blocking Dependencies:

Recruit Integration L4 Maintainer (owns DD013 implementation)
Recruit Data L4 Maintainer (owns DD008 OWMeta revival)
Recruit Validation L4 Maintainer (owns DD021 revival)

Phase A2: Governance & Derisking (Weeks 3-4)¶

Status: ⚠️ Proposed — Can proceed in parallel with Phase A1

Phase Rationale: These DDs enable governance at scale and derisk Phase 1 science. They don't block contributor workflow but make it sustainable. DD025 runs foundation model cross-validation in parallel with A1, providing priors that accelerate DD005 calibration. DD011/DD012/DD015 formalize roles, processes, and AI agent workflows. None of these block modeling work, but all of them enable the project to scale beyond a handful of contributors.

Scope:

DD	Title	Owner	Effort	Purpose
DD011	Contributor Progression Model	Founder	~8 hours	Governance — formalizes contributor advancement (L0-L5)
DD012	Design Document RFC Process	Founder	~8 hours	Governance — formalizes DD review/approval
DD015	AI-Native Contributor Model	Founder	~12 hours	Governance — AI agent registration and task pipeline
DD025	Foundation Model Channel Kinetics	ML/Structural Bio (TBD)	~20 hours	Derisks DD005 Phase 1 calibration

Key Deliverables:

AI contributor workflow (DD015) — Agent registration system, DD→GitHub issue decomposition, AI pre-review pipeline, human final-approval gates
Channel kinetics predictions (DD025) — Cross-validation of foundation model predictions against ~50-100 channels with known kinetics; channel_kinetics_predictions.csv ready for DD005 integration
DD RFC process and template (DD012) — Documented review lifecycle (Proposed → Accepted → Superseded), DD template with required sections (Integration Contract, Getting Started, Deliverables), GitHub PR-based review workflow
Contributor progression structure (DD011) — L0–L4 level definitions with explicit advancement criteria, badge taxonomy (6 categories), BadgeList API integration, GitHub team/permission configuration, onboarding checklist

Milestone: 🎉 "Governance Framework and Foundation Model Cross-Validation" (Target: Week 4, late March 2026)

What you see: Contributor levels documented and enforced via GitHub teams. DD review process operational. AI agents registered and generating issues from DD Integration Contracts. Foundation model cross-validation complete — channel_kinetics_predictions.csv ready for Phase 1 integration.
Key result: If DD005's naive expression→conductance mapping fails in Phase 1, structure-based predictions from DD025 are ready immediately as a fallback.

Success Criteria:

✅ DD025 cross-validation: predicted kinetics within <30% relative error of measured values for known channels (DD025)
✅ AI contributor registry repo exists, issue auto-generation from DD Integration Contracts demonstrated (DD015)
✅ DD template exists with all required sections; at least one DD has completed the RFC review lifecycle (DD012)
✅ L0–L4 levels documented, BadgeList group configured with 6 badge categories, GitHub teams match level permissions (DD011)

Datasets Needed:

DD025 inputs — Ion channel sequences (WormBase) + known kinetics (~50-100 channels)

Blocking Dependencies:

None — Phase A2 has no infrastructure dependencies and can proceed in parallel with Phase A1

Phase 1: Cell-Type Specialization (Months 1-3)¶

Status: ⚠️ Ready to Start (after Phase A1 complete)

Phase Rationale: Phase 1 is the first modeling phase — it specializes the 302 identical neurons into 128 biologically distinct classes. DD005 is here because it needs CeNGEN data via DD008 and validation tools from DD021 (both Phase A1), and because its scientific risk is highest — if expression→conductance mapping fails, better to discover it early. DD014/DD014.1 establish visual infrastructure because months of work with no visual feedback kills contributor engagement. DD010 Tier 2 activates the functional connectivity gate (r > 0.5) so Phase 2 doesn't build on a broken foundation. DD025 (Phase A2) integration feeds foundation model predictions into DD005 calibration while it's actively running.

Scope:

DD	Title	Dependencies	What Changes
DD005	Cell-Type Specialization Strategy	DD001, DD008/DD020 (CeNGEN)	Replace 302 identical neurons with 128 distinct neuron classes
DD014 (Phase 1)	Post-Hoc Trame Viewer	DD001-DD003, DD005	Evolve Worm3DViewer from Streamlit to Trame; OME-Zarr export
DD014.1	Visual Rendering Specification	DD014	Canonical color palette (37 materials), 14 reference mockups, material definitions for all 959 cells
DD010 (Tier 2)	Functional Connectivity Validation	DD005, DD008	Activate Tier 2 blocking gate (r > 0.5 vs. Randi 2023)
DD025 (integration)	Foundation Model → DD005 Priors	ML/Structural Bio (TBD)	~12 hours

Key Deliverables:

128 cell-type NeuroML files (cells/AVALCell.cell.nml, etc.) — CeNGEN expression → conductance densities. The current 4 generic channels (leak, K_slow, K_fast, Ca_boyle) borrowed from muscle electrophysiology (Boyle & Cohen 2008) expand to 14+ neuron-class-specific channels. Existing NeuroML2 models from Nicoletti et al. (2019) and NMODL files from BAAIWorm (Zhao et al. 2024) provide 31 channels as a head start. Channel assignment uses CeNGEN single-cell transcriptomics rather than functional grouping — more biologically grounded because two neurons in the same functional group may express different channel complements. See DD005 Draft Issues for the channel survey, adoption, and validation tasks.
Calibration parameters CSV (data/expression_to_conductance_calibration.csv) — Fit from ~20 neurons with electrophysiology
Differentiated c302 network (LEMS_c302_C1_Differentiated.xml) — Generated via python CElegans.py C1Differentiated
OME-Zarr export pipeline (DD014) — master_openworm.py Step 4b writes output/openworm.zarr/ with neural/, muscle/, body/ groups
Trame viewer (DD014) — Replaces Streamlit+stpyvista, supports time animation in browser
Tier 2 validation (DD010) — Automated correlation vs. Randi 2023; CI blocks PRs if r < 0.5
Visual rendering spec (DD014.1) — 37-material color palette, activity-state overlays, 14 reference mockups as acceptance tests
WormBrowser enhancement (DD014) — Click neuron/cell → links to WormAtlas + WormBase on browser.openworm.org (quick win for John White, ~8-16 hrs)

Milestone: 🎉 "Biologically Distinct Neurons" (Target: Month 3, June 2026)

What you run: docker compose run simulation then docker compose up viewer — open localhost:8501
What you see: 3D viewer with smooth worm body crawling. Toggle "Neurons" layer — 302 neurons appear, colored by class (128 distinct colors). Click AVAL — inspector panel shows its class-specific voltage and calcium traces, visibly different from ASER or AWCL. Toggle "color by class" mode to see the diversity.
Validated against: Randi et al. 2023 whole-brain calcium imaging — simulated functional connectivity correlation improves ≥20% over generic baseline (e.g., r=0.3 → r≥0.36). Side-by-side comparison plot: generic model vs. CeNGEN-parameterized model vs. experimental data.
Video: Time-lapse of simulation with neurons glowing by activity — command interneurons (AVA, AVB) show graded potentials, sensory neurons (ASEL, ASER) show distinct response profiles, motor neurons fire rhythmically driving visible muscle contractions.

Success Criteria:

✅ All 128 .cell.nml files generated without error
✅ jnml -validate passes for each cell type
✅ Tier 2 validation: correlation with Randi 2023 improves ≥20% vs. generic baseline
✅ Tier 3 validation: kinematic metrics remain within ±15% (no regression)
✅ Boyle-Cohen 2D fast trajectory tool screens kinematic impact of DD005 conductance parameter changes in <30 seconds (no GPU needed)
✅ Trame viewer launches via docker compose up viewer, shows time-animated worm at localhost:8501
✅ OME-Zarr export complete: neural/, muscle/, body/ groups all populated

Datasets Needed: See DD024 for the complete inventory. Key Phase 1 datasets:

CeNGEN L4 expression (128 classes × 20,500 genes) — DD005 conductance calibration
Electrophysiology training set (~20 neurons with measured conductances) — DD005 calibration regression
Ion channel gene list (~100 genes) — DD005 gene→channel mapping

Blocking Dependencies:

Phase A1 complete (DD013 Docker stack, DD021 toolbox working)
CeNGEN data downloaded and validated
Electrophysiology training set curated (20 neurons with measured conductances)

Phase 2: Slow Modulation + Closed-Loop Sensory (Months 4-6)¶

Status: ⚠️ Proposed (ready after Phase 1)

Phase Rationale: Phase 2 closes the sensory loop (body→neuron feedback) and adds the neuropeptide modulation layer. DD006 requires specialized neurons from Phase 1 because the 31,479 peptide-receptor interactions are cell-type-specific — modulation on generic neurons would be meaningless. DD019 needs cell-type-specific MEC-4 channels from DD005. Both must exist before Phase 3: DD018 requires serotonergic modulation, and emergent behaviors (chemotaxis, thermotaxis) require sensory input. DD022 and DD023 can proceed in parallel with DD006/DD019. DD001 Level D Stage 1 starts with 5 representative multicompartmental neurons to validate the approach before committing to all 302.

Scope:

DD	Title	Dependencies	What Changes
DD006	Neuropeptidergic Connectome Integration	DD001, DD005	Add 31,479 peptide-receptor interactions as slow modulation layer
DD019	Closed-Loop Touch Response	DD001, DD003, DD005	MEC-4 mechanotransduction + bidirectional coupling + tap withdrawal
DD022	Environmental Modeling & Stimulus Delivery	DD003, DD019	Agar substrate, chemical/thermal gradients, chemotaxis (CI >0.5) + thermotaxis
DD023	Proprioceptive Feedback & Motor Coordination	DD001, DD003, DD019	Stretch receptors on B-class motor neurons, wavelength stability ±10%
DD026	Reservoir Computing Validation	DD001, DD002, DD005, DD020	Tests RC framing: 5 properties × 4 partitions, falsifiable predictions (pure analysis, no sim changes)
DD014 (Phase 2)	Interactive Dynamic Viewer	DD014 Phase 1, DD006, DD019	Layer toggle, pharynx/intestine (future), neuropeptide volumetric clouds, validation overlay
DD027	Multicompartmental Neurons (Proof of Concept)	DD001, DD005	5 representative neurons with 14 ion channel classes, EM morphologies, fitted to electrophysiology. Includes spatially resolved synapse placement using Witvliet 2021 EM centroid distances. Stage 1 proof-of-concept; Stage 2 (all 302 neurons) is Phase 4-5.

Key Deliverables:

NeuroML peptide extensions (lems/PeptideReleaseDynamics.xml, lems/PeptideReceptorDynamics.xml)
Neuropeptidergic adjacency CSV (31,479 rows: source, target, peptide, receptor, distance, modulation_type)
MEC-4 channel model (channel_models/mec4_chan.channel.nml) — Strain-gated DEG/ENaC channel
Cuticle strain readout (sibernetic/coupling/strain_readout.py) — SPH particles → local strain per touch neuron
Bidirectional coupling (sibernetic_c302_closedloop.py) — Extends existing forward coupling with body→sensory reverse path
Tap stimulus (sibernetic/stimuli/tap_stimulus.py) — Boundary particle displacement, configurable position
Agar substrate + chemical/thermal gradients (DD022) — Steady-state NaCl gradient field, thermal gradient, substrate boundary particles
Stretch receptor channel model (DD023) — Curvature-gated channels on B-class motor neurons (DB, VB), body curvature readout from SPH
Viewer enhancements (DD014) — Neuropeptide volumetric layer, strain heatmap, reversal event markers, gradient field visualization
RC validation report (DD026) — rc_validation_report.json with 5 RC properties × 4 neuron partitions, falsifiable predictions tested

Milestone: 🎉 "The Worm Can Feel and Modulate" (Target: Month 6, September 2026)

What you run: docker compose run simulation --config closedloop_touch then open the viewer
Demo 1 — Tap withdrawal: Worm crawls forward. At t=5s, anterior tap stimulus fires. Watch: touch receptor neurons (ALM, AVM) activate → command interneurons (AVA, AVD) depolarize → motor neurons reverse → worm reverses direction within <1 second, travels backward ≥1 body length, then resumes forward crawling. Compare: --config openloop (same tap, no reversal — the worm is deaf).
Demo 2 — Neuropeptide knockout: Run with neuropeptides.flp_knockout: true. Watch locomotion pattern change — speed, reversal frequency, body wave amplitude all shift. Compare side-by-side with wild-type. Matches Li et al. 1999 / Rogers et al. 2003 FLP loss-of-function phenotypes.
Validated against: Chalfie et al. 1985 tap withdrawal (reversal latency, distance, direction discrimination); Wicks et al. 1996 anterior-vs-posterior direction selectivity; ≥3 peptide knockout phenotypes within 30% of experimental measurements.
In the viewer: Neuropeptide volumetric clouds visible as colored mist waxing/waning on seconds timescale. Cuticle strain heatmap shows where the body is being compressed. Reversal events marked on the time scrubber.

Success Criteria:

✅ Peptide-enabled simulation completes without crash (Tier 3 kinematics not degraded)
✅ ≥3 peptide knockout phenotypes reproduced within 30% error (DD006 validation)
✅ Tap stimulus → reversal onset <1 s, distance ≥1 body length (DD019 Tier 3)
✅ Anterior touch → backward, posterior touch → forward (direction discrimination, DD019)
✅ Closed-loop stable for 30s without NaN/divergence (DD019 quick-test)
✅ Chemotaxis: CI (chemotaxis index) >0.5 on simulated NaCl gradient (DD022)
✅ Thermotaxis: worm navigates to cultivation temperature ±2°C (DD022)
✅ Wavelength stability: ±10% with proprioception enabled (improved from ±15%), >30% degradation when stretch receptors disabled (DD023)
✅ Viewer shows: neuropeptide volumetric clouds, cuticle strain heatmap, reversal event markers, gradient fields
✅ RC validation: All 5 predictions tested across all 4 partitions; results documented in rc_validation_report.json (DD026)

Datasets Needed: See DD024 for the complete inventory. Key Phase 2 datasets:

Ripoll-Sanchez 2023 neuropeptide connectome (31,479 interactions) — DD006 extrasynaptic wiring
Touch neuron electrophysiology (MEC-4 kinetics) — DD019 channel model validation
Tap withdrawal behavioral data (Chalfie 1985, Wicks 1996) — DD019 Tier 3 validation
BAAIWorm NMODL + SWC data — DD027 multicompartmental neurons
Chemotaxis and thermotaxis behavioral data — DD022 Tier 3 validation

Blocking Dependencies:

Phase 1 complete (specialized neurons are substrate for peptide modulation)
Ripoll-Sanchez data downloaded and ingested into ConnectomeToolbox/OWMeta
3D cell positions extracted (for peptide distance-dependent attenuation)

Phase 3: Organ Systems + Hybrid ML (Months 7-12)¶

Status: ⚠️ Proposed (ready after Phase 2)

Phase Rationale: Phase 3 adds three semi-autonomous organ subsystems and the ML acceleration framework. DD007 (pharynx) and DD018 (egg-laying) need specialized parameters from DD005, and DD018 specifically requires DD006's serotonin modulation. DD009 (intestine) couples to neural circuits via DVB/AVL neurons. All three organ DDs can be implemented in parallel by different contributors. DD017 (hybrid ML) waits until Phase 3 because: (1) the differentiable backend needs stable ODE equations — porting during Phase 1-2 equation changes wastes effort, (2) the SPH surrogate needs 500+ training runs that go stale if body dynamics change, and (3) learned sensory models are only appropriate after the mechanistic approach (Phase 2) has been tried — using ML before mechanism contradicts OpenWorm's interpretability commitment.

Scope:

DD	Title	Dependencies	What Adds
DD007	Pharyngeal System Architecture	DD001, DD002, DD005	63-cell semi-autonomous organ (20 neurons + 20 muscles), 3-4 Hz pumping
DD009	Intestinal Oscillator Model	DD001, DD004 (optional)	20-cell IP3/Ca oscillator, defecation motor program (50s period)
DD018	Egg-Laying System Architecture	DD001, DD002, DD005, DD006	28-cell reproductive circuit (HSN serotonergic, VC cholinergic, 16 sex muscles), two-state pattern
DD017	Hybrid Mechanistic-ML Framework	DD001-DD005, DD010	Differentiable backend (auto parameter fit), SPH surrogate (1000× speedup), learned sensory (Component 3 extracted to DD025)

Key Deliverables:

Pharyngeal network (LEMS_c302_pharynx.xml) — 20 neurons + 20 muscles, pumping oscillator module
Intestinal network (LEMS_IntestineOscillator.xml) — 20 cells with IP3R, gap-junction-coupled
Egg-laying network (LEMS_c302_EggLaying.xml) — HSN, VC, vulval/uterine muscles, serotonin/ACh/tyramine synapses
Differentiable worm (openworm-ml/differentiable/) — PyTorch ODE solver, full DD001+DD002+DD009 chain
SPH surrogate (openworm-ml/surrogate/) — FNO trained on 500+ SPH runs (supplemented by 2D rod-spring model trajectories at orders of magnitude higher throughput), <5% trajectory error, 1000× faster
Auto-fitted parameters — Gradient descent on DD010 validation loss, per-neuron-class conductances
Per-synapse weight optimization (DD017 Component 1) — Replaces the uniform baseline g_syn = 0.09 nS with per-synapse conductances fitted via gradient descent against whole-brain functional connectivity data, following Zhao et al. (2024). Extended with neurotransmitter identity constraints from Wang et al. (2024) and full 302-neuron optimization. Config: neural.synapse_optimization: true/false. See DD017 Draft Issues.

Milestone: 🎉 "From 302 Neurons to 433 Cells — Multi-Organ Simulation" (Target: Month 12, March 2027)

What you run: docker compose run simulation --config full_organism (runs for ~20 simulated minutes to capture egg-laying cycle). Then open viewer.
What you see — 3 organs running simultaneously:
- Pharynx (toggle layer ON): 63 pharyngeal cells at the head pump rhythmically at 3-4 Hz. Corpus contracts → isthmus peristalsis → terminal bulb grinds. Pharyngeal neurons (MC, M3) fire in sync with the pump cycle. Validated against Raizen & Avery 1994 electropharyngeogram recordings.
- Intestine (toggle layer ON): 20 intestinal cells show a calcium wave propagating posterior-to-anterior every ~50 seconds. Cells color from blue→red as [Ca2+] rises. Every wave triggers a visible defecation motor program — body contraction runs anterior-to-posterior. Validated against Thomas 1990 (50 ± 10s cycle period).
- Egg-laying (toggle layer ON): HSN neurons fire serotonergic bursts → vulval muscles contract → eggs deposited. Two-state pattern: ~20 min inactive, ~2 min active bout (3-5 eggs). Validated against Collins et al. 2016 calcium imaging.
Demo — ML surrogate: docker compose run surrogate — same muscle activation input, full SPH takes hours, surrogate completes in <1 minute. Overlay both trajectories: <5% difference. Enables rapid parameter sweeps that were previously impossible.
All the while: Body locomotion continues in background — worm crawls, pharynx pumps, intestine oscillates, eggs are laid. Multiple timescales visible simultaneously (ms for neurons, seconds for pumping, minutes for defecation, tens of minutes for egg-laying).

Success Criteria:

✅ Pharyngeal pumping frequency: 3-4 Hz (DD007 Tier 3)
✅ Intestinal defecation period: 50 ± 10 s (DD009 Tier 3)
✅ Egg-laying two-state pattern: inactive ~20 min, active ~2 min, 3-5 eggs/bout (DD018 Tier 3)
✅ Differentiable backend matches NEURON/jNML reference within ±5% (DD017 validation)
✅ SPH surrogate achieves <5% trajectory error, ≥100× speedup (DD017 validation)
✅ Auto-fitted parameters equal or improve DD010 scores vs. hand-tuned baseline
✅ Body locomotion still within ±15% (no regression from adding organs)

Datasets Needed: See DD024 for the complete inventory. Key Phase 3 datasets:

Raizen 1994 EPG recordings — DD007 pharyngeal validation
Thomas 1990 defecation data — DD009 Tier 3 validation (50s period)
Collins 2016 egg-laying calcium imaging — DD018 validation
SPH simulation training set (500+ runs, ~2,500 GPU-hours) — DD017 surrogate training

Blocking Dependencies:

Phase 1 complete (specialized neurons)
Organ-specific validation data curated (pharynx EPG, defecation period, egg-laying patterns)
GPU cluster access for SPH surrogate training (500+ long runs)

Phase 4: Mechanical Cell Identity + High-Fidelity Visualization (Months 13-18)¶

Status: ⚠️ Proposed (ready after Phase 3)

Phase Rationale: Phase 4 completes the organism: all 959 somatic cells with cell-type mechanics and a public web viewer. DD004 is here because per-cell mechanical properties (elasticity, adhesion) should be informed by organ system behavior — setting intestine elasticity before implementing the intestine means guessing. DD014.2 needs both DD004 cell boundaries and stable SPH body dynamics. DD014 Phase 3 (Three.js + WebGPU public deployment) requires all content stable — static site deployment to wormsim.openworm.org is the capstone milestone. DD004 and DD014.2 can proceed in parallel.

Scope:

DD	Title	Dependencies	What Adds
DD004	Mechanical Cell Identity	DD003, DD008, DD007/DD009 (cell positions)	Per-particle cell IDs (959 somatic cells), cell-type-specific elasticity/adhesion
DD014.2	Anatomical Mesh Deformation Pipeline	DD003, DD014	GPU skinning + cage-based MVC + PBD collision for ~1.6M Virtual Worm vertices
DD014 (Phase 3)	Public Experience Viewer	DD014 Phase 2, DD014.2	Three.js + WebGPU, molecular scale, static site deployment, "Digital Organism In Your Browser"

Key Deliverables:

Tagged particle file (extended SPH_Particle_v2 struct: 44 bytes with cell_id, elasticity_mult, adhesion)
Cell-to-particle mapping (data/cell_to_particle_map.json) — 959 somatic cells → particle indices
Cell boundary meshes (data/cell_boundaries/*.obj) — Per-cell 3D volumes from Witvliet 2021 EM
Deformed Virtual Worm meshes (DD014.2) — 688 anatomical meshes follow SPH body shape in real-time
Three.js viewer (DD014 Phase 3) — Client-side, no server, molecular scale with gene expression pipeline visible
Static site deployment — wormsim.openworm.org (GitHub Pages or CDN)

Milestone: 🎉 "WormSim 2.0 — 959-Cell Digital Organism In Your Browser" (Target: Month 18, September 2027)

What you run: Open wormsim.openworm.org in any browser. No Docker, no installation, no server.
What you see — 3 scales of exploration:
- Organism scale (default): Smooth, translucent C. elegans crawling across the screen. Anatomical meshes (688 Virtual Worm pieces) deform with the SPH body in real-time. Pharynx pumps at the head, defecation contractions visible every ~50s.
- Tissue/Cell scale (zoom in): Click any of 959 individually labeled cells. Neurons glow by voltage. Muscles flash by contraction. Intestinal cells show calcium waves. Inspector panel shows cell identity (WBbt ID), real-time traces, and links to WormBase.
- Molecular scale (zoom further): See ion channels opening/closing on a neuron's membrane. Calcium flowing through IP3 receptors in intestinal cells. Gene transcription → mRNA export → ribosomal translation → vesicle trafficking → channel insertion (per DD014.1 Mockups 13-14).
Validated against: All previous tiers still passing — kinematics (Yemini 2013 ±15%), functional connectivity (Randi 2023), organ rhythms (pharynx, intestine, egg-laying). Cell-type-specific elasticity produces realistic body mechanics: intestine soft (0.8x), cuticle stiff (5-10x), muscles intermediate (1.5x).
Performance: 60fps on a 2020-era laptop. All 688 meshes deform in <4ms per frame. Progressive OME-Zarr loading — start viewing immediately while more data streams in background.

Success Criteria:

✅ All 959 somatic cells mapped to ≥1 SPH particle each
✅ Cell-type-specific elasticity: intestine (0.8x baseline), cuticle (5-10x), muscles (1.5x), hypodermis (0.5x)
✅ Tier 3 validation: kinematic metrics within ±15% with cell_identity: true enabled
✅ Mesh deformation: All 688 Virtual Worm meshes deform with SPH body, no interpenetration, <4ms per frame (60fps budget)
✅ Three.js viewer: 60fps on 2020-era laptop, all 3 scales working, static deployment (no server required)
✅ Molecular scale: Gene transcription → mRNA export → ribosomal translation → vesicle trafficking → channel insertion visible (DD014.1 Mockups 13-14)
✅ WormBrowser feature parity: layer peeling, search by cell name, click-to-identify, static hosting — all WormBrowser features matched (DD014 checklist)
✅ browser.openworm.org redirects to wormsim.openworm.org

Datasets Needed: See DD024 for the complete inventory. Key Phase 4 datasets:

Witvliet 2021 cell boundary meshes — DD004 particle tagging (needs EM conversion)
Virtual Worm Blender meshes (688 meshes, ~1.6M vertices) — DD014.2 deformation
Cell-type mechanical properties — DD004 elasticity/adhesion parameters

Blocking Dependencies:

Phase 3 complete (organ systems implemented)
Witvliet EM data converted to cell boundary meshes
Virtual Worm meshes exported from Blender to individual OBJ files
GPU access for mesh deformation compute shaders (WebGPU or local testing)

Phase 5: Intracellular Signaling Cascades (Months 19-24+)¶

Status: 📝 Not Yet Specified — Placeholder for future work

Anticipated Scope:

Detailed GPCR cascades (Gq/Gs/Gi → PLC/adenylyl cyclase → IP3/cAMP → PKA/PKC)
Second messenger dynamics (IP3, DAG, cAMP, cGMP)
Channel phosphorylation and trafficking
Non-neuronal peptide signaling (intestine, hypodermis, gonad)
Cross-tissue signaling (endocrine, paracrine)

Why Deferred: Current phenomenological models (DD006 conductance modulation, DD009 simplified IP3R) capture functional effects without full biochemical detail. Phase 5 adds mechanistic depth when validation requires it.

Milestone (Projected): "Molecular-Level Intracellular Dynamics"

Foundation Model Opportunities

Phase 5's biggest blocker is the lack of C. elegans-specific biochemical rate constants — most GPCR cascade kinetics come from mammalian systems. Protein foundation models can bridge this gap by predicting binding affinities, conformational dynamics, and pathway structure from sequence rather than requiring per-species experimental measurement:

Model	Developer	Phase 5 Application
Boltz-2	MIT/Recursion	Predict GPCR-G protein complex structures AND second messenger (IP3, cAMP, DAG) binding affinities to target proteins (PKA, PKC, IP3R). Single-GPU, approaching FEP+ accuracy
AlphaFold 3	DeepMind	Model full signaling complex assemblies: GPCR→Gα→effector (PLC, adenylyl cyclase) with bound ligands, ions, and lipids
BioEmu-1	Microsoft	Simulate GPCR activation (inactive→active conformational transition), channel phosphorylation-induced gating changes, and kinase catalytic dynamics at 100,000x MD speed — extract rate constants from conformational landscapes
NatureLM	Microsoft	Unified protein + small molecule model (46.7B params). Predict cross-domain interactions: neuropeptide → GPCR → second messenger → kinase. Estimate binding affinities and ADMET properties for signaling molecules
SubCell	CZI/Human Protein Atlas	Subcellular protein localization from microscopy. Map where signaling proteins localize within cells (ER vs. plasma membrane vs. cytoplasm), constraining spatial compartmentalization of cascades. Trained on human cells — would need adaptation for C. elegans
OntoProtein	Zhejiang University	GO-informed protein-protein interaction prediction. Infer kinase-substrate relationships (which PKA/PKC isoforms phosphorylate which ion channels) from Gene Ontology structure, filling gaps where direct experimental data is unavailable
scPRINT	Institut Pasteur	Gene network inference from 50M cells. Identify co-regulation patterns between GPCR pathway components in CeNGEN data (e.g., which receptor, G protein, and effector genes are co-expressed in each neuron class), suggesting functional pathway wiring

Key insight: BioEmu-1 + Boltz-2 together could predict most of the "Biochemical rate constants" listed below from protein structure alone — GPCR activation rates from conformational dynamics, second messenger binding K_d from complex prediction, and phosphorylation effects on channel gating from before/after conformational ensembles. This reduces Phase 5's dependency on scarce C. elegans-specific experimental kinetics.

Precedent: Whole-Cell Computational Modeling (Karr et al. 2012)

Phase 5's goal — mechanistic intracellular signaling from GPCR activation to channel phosphorylation — is conceptually related to the first whole-cell computational model, built for Mycoplasma genitalium by Karr, Sanghvi, Macklin et al. (2012) in Markus Covert's lab at Stanford. That model demonstrated that a complete intracellular simulation is achievable:

28 submodels covering DNA replication, transcription, translation, metabolism, protein complexation, and cell division — each using the formalism best suited to its biology (FBA for metabolism, stochastic/Gillespie for transcription, ODE for metabolite dynamics, Boolean logic for gene regulation, Markov chains for RNA degradation)
16 shared cell state variables (chromosome, transcripts, RNA, polypeptides, protein monomers/complexes, metabolites, ribosomes, RNA polymerase, geometry, mass, time, etc.) integrated at 1-second timesteps with sequential random-order execution
1,900+ parameters curated manually from 900+ publications, with cross-species transfer from mammalian/bacterial data filling gaps
Validation: 79% accuracy on gene essentiality predictions; correctly predicted phenotypes for 72% of tested single-gene knockouts

Karr's 2014 Stanford dissertation identified three key limitations that constrained the approach:

Parameter curation bottleneck: ~6 person-years to manually extract 1,900 rate constants from 900 papers — and M. genitalium has only 525 genes (the smallest free-living genome). C. elegans has ~20,000 genes.
Cross-species parameter transfer: ~30% of parameters were borrowed from other organisms (E. coli, yeast, mammalian) due to missing M. genitalium measurements. Accuracy of these transfers was unknown.
Scaling challenge: The hybrid multi-formalism approach worked for a 525-gene minimal cell but was not demonstrated for organisms with thousands of genes and complex multicellular signaling.

How Foundation Models Transform the Whole-Cell Approach for C. elegans

The Karr/Covert limitations that seemed intractable in 2012 are now addressable with the foundation models listed above:

Karr 2012 Limitation	Foundation Model Solution	Improvement
Parameter curation (6 person-years, 900 papers)	BioEmu-1 predicts conformational dynamics → rate constants from structure alone; Boltz-2 predicts binding affinities from complex structures	Months → days for ~50 key GPCR cascade parameters
Cross-species transfer (borrowing mammalian rates)	ESM Cambrian / OntoProtein predict C. elegans-specific kinetics from worm protein sequences, no need to borrow from mammals	Species-specific predictions from sequence
Pathway inference (manually reading papers)	scPRINT infers gene regulatory networks from CeNGEN 50M-cell atlas; NatureLM predicts protein-small molecule interactions across domains	Automated pathway wiring from expression data
Scaling to 20,000 genes	Only ~200-300 signaling genes are relevant to Phase 5 GPCR cascades (not whole-genome); DD005 CeNGEN data identifies which genes are expressed per cell type	Scoped to signaling genes, not whole genome
Subcellular compartmentalization (not modeled in Karr)	SubCell predicts protein localization (ER vs. membrane vs. cytoplasm), constraining spatial cascade models	Compartment assignment from microscopy

The key architectural lesson from Karr et al. is the hybrid multi-formalism approach: use ODEs where kinetics are smooth, stochastic simulation where copy numbers are low, and FBA where metabolic flux balance is the right abstraction. Phase 5 should adopt this principle — GPCR cascades are well-suited to ODEs, but stochastic events (neuropeptide vesicle release, channel insertion) may require Gillespie-style simulation.

Why Phase 5 is tractable even though whole-cell eukaryotic modeling isn't solved yet. As of 2026, no one has built a Karr-level "every gene product accounted for" whole-cell model of a eukaryotic cell. The Covert lab scaled from M. genitalium (525 genes) to E. coli (4,288 genes) — still prokaryotic but 8x the gene count — and in 2023 extended it to whole-colony simulations with single-cell heterogeneity. For eukaryotes, the yeast WM_S288C model integrates 15 cellular states and 26 processes across ~6,447 genes, but is described as "an important first step" rather than complete — the data integration challenge alone is still being solved (YCMDB database, 2024). The eukaryotic gap comes from compartmentalization (nucleus, ER, Golgi, mitochondria), complex gene regulation (chromatin, splicing), and intracellular signaling — exactly the biology Phase 5 targets. But Phase 5 does not need to be a whole-cell model. It only needs the GPCR→second messenger→channel phosphorylation cascades for the ~200-300 signaling genes expressed in C. elegans neurons (DD005 CeNGEN data scopes this precisely). That is a far more tractable problem than modeling all 20,000 genes, and the foundation models above make it feasible without the 6-person-year parameter curation that even the 525-gene M. genitalium model required.

Datasets Needed: See DD024 "Projected Datasets (Phases 5-7)" for inventory. Key needs: biochemical rate constants, proteomics, subcellular calcium imaging, GPCR-G protein coupling specificity — many may be predictable via BioEmu-1 + Boltz-2 foundation models.

Phase 6: Developmental Modeling (Year 2+)¶

Status: 📝 Not Yet Specified — Placeholder for multi-stage simulation

Existing Work: DevoWorm

The DevoWorm project (devoworm.weebly.com, github.com/devoworm) has been building toward developmental modeling since 2014 as an OpenWorm sub-project. DevoWorm's three research areas map directly onto Phase 6 needs:

Developmental Dynamics: Quantitative embryogenesis datasets, differentiation trees, and embryogenetic connectome analysis — directly applicable to modeling neuron birth order, cell lineage, and stage-specific neural topology
Cybernetics and Digital Morphogenesis: Cellular automata (Morphozoic) and Cellular Potts (CompuCell3D) models of embryogenesis — candidate frameworks for body morphogenesis simulation (L1 ~240 µm → adult ~1000 µm)
Reproduction and Developmental Plasticity: Larval development and life-history data — validation targets for stage-specific behavioral differences

Phase 6 should build on DevoWorm's datasets and models rather than starting from scratch. Key integration points:

DevoWorm's embryogenetic connectome analysis provides the developmental graph connecting cell lineage to neural circuit formation
DevoWorm's differentiation trees complement DD005's CeNGEN-based cell-type approach by adding temporal dynamics (when each neuron class differentiates)
DevoWorm's CompuCell3D models could inform the body size scaling mechanics in DD003/DD004

Anticipated Scope:

Witvliet developmental connectome series (L1 → L2 → L3 → L4 → adult, 8 stages)
Neuron birth and death (programmed cell death, 131 cells die during development)
Body size scaling (L1 ~240 µm → adult ~1000 µm)
Stage-specific validation (L1 locomotion differs from adult)
CeNGEN L1 expression integration
Integration of DevoWorm embryogenetic connectome and differentiation tree data

Milestone (Projected): "Worm That Grows"

Announcement: "Simulate C. elegans development from L1 larva to adult, watching neurons born and the body grow."

Datasets Needed: See DD024 "Projected Datasets (Phases 5-7)" for inventory. Key resources: Witvliet developmental connectome series (in cect), CeNGEN L1 expression, Packer 2019 embryonic scRNA-seq, DevoWorm embryogenetic connectome and differentiation trees.

Phase 7: Male-Specific Modeling (Year 3+)¶

Status: 📝 Not Yet Specified — Placeholder for male hermaphrodite simulation

Anticipated Scope:

385-neuron male connectome (Cook2019MaleReader)
83 male-specific neurons (ray neurons, HOB, spicule motor neurons)
Male tail anatomy (fan, rays, spicules) in DD003/DD004
Mating circuit and copulation behavior

Milestone (Projected): "Both Sexes Simulated"

Datasets Needed: See DD024 "Projected Datasets (Phases 5-7)" for inventory. Key resources: Cook 2019 male connectome (in cect), male behavioral/mating data, male-specific tail anatomy.

Complete Dataset Inventory¶

For the canonical inventory of all datasets across all phases, see DD024: Validation Data Acquisition Pipeline. DD024 catalogs:

Tier 1-4 validation datasets — Electrophysiology, functional connectivity, behavioral kinematics, causal/interventional data
Connectome & molecular datasets — Connectome data available via cect (DD020), expression data, cell ontologies
Implementation & reference datasets — Model inputs (BAAIWorm, Virtual Worm, ion channel sequences), reference implementations (CE_locomotion), training data (SPH simulation runs)
Projected datasets (Phases 5-7) — Biochemical kinetics, developmental data, male-specific anatomy

Each dataset is tagged with its phase, consumer DD, acquisition method, and status. For connectome-specific datasets in detail, see also DD020.

Dependency Summary (What Blocks What)¶

Critical Path (must be done in order):

Phase A1 ([DD013](DD013_Simulation_Stack_Architecture.md), [DD008](DD008_Data_Integration_Pipeline.md), [DD021](DD021_Movement_Analysis_Toolbox_and_WCON_Policy.md), [DD024](DD024_Validation_Data_Acquisition_Pipeline.md), [DD028](DD028_Project_Metrics_Dashboard.md)) → Phase 1 ([DD005](DD005_Cell_Type_Differentiation_Strategy.md), [DD014](DD014_Dynamic_Visualization_Architecture.md)/[DD014.1](DD014.1_Visual_Rendering_Specification.md)) → Phase 2 ([DD006](DD006_Neuropeptidergic_Connectome_Integration.md), [DD019](DD019_Closed_Loop_Touch_Response.md), [DD022](DD022_Environmental_Modeling_and_Stimulus_Delivery.md), [DD023](DD023_Proprioceptive_Feedback_and_Motor_Coordination.md)) → Phase 3 ([DD007](DD007_Pharyngeal_System_Architecture.md), [DD009](DD009_Intestinal_Oscillator_Model.md), [DD018](DD018_Egg_Laying_System_Architecture.md), [DD017](DD017_Hybrid_Mechanistic_ML_Framework.md))

Parallelizable:

Phase A2 (DD011, DD012, DD015, DD025) can proceed in parallel with Phase A1 (no infrastructure dependencies)
Phase 1 DD014/DD014.1 (viewer + rendering spec) can proceed alongside DD005 (cell-type specialization)
Phase 2 DD022 (environment) and DD023 (proprioception) can proceed in parallel with DD006 and DD019
Phase 3 organ DDs (DD007, DD009, DD018) can be implemented in any order or in parallel
Phase 4 DD004 (cell identity) and DD014.2 (mesh deformation) can proceed in either order

What Blocks Everything:

Integration Maintainer recruitment — Without this, DD013 doesn't get implemented
Data Maintainer recruitment — Without this, DD008 OWMeta doesn't get revived and Phase 1+ datasets lack unified access
Validation Maintainer recruitment — Without this, DD021 doesn't get revived
Phase A1 completion — Without config system, data layer, and automated validation, contributor workflow doesn't work

Timeline Summary¶

Phase	Duration	Calendar (if start March 2026)	Cumulative Cells	Cumulative DD Implementation
Phase 0	Functional	Architecture defined, simulation runs	397 (302 neurons + 95 muscles)	4 DDs (DD001-DD003, DD020)
Phase A1	2 weeks	Mar 2026 (Wks 1-2)	(no change)	+5 DDs (DD013, DD008, DD021, DD024, DD028)
Phase A2	2 weeks	Mar 2026 (Wks 3-4, parallel with A1)	(no change)	+4 DDs (DD011, DD012, DD015, DD025)
Phase 1	3 months	Apr-Jun 2026	397 (specialized, not added)	+3 DDs (DD005, DD010 Tier 2, DD014 Phase 1, DD014.1)
Phase 2	3 months	Jul-Sep 2026	403 (add 6 touch neurons explicitly modeled)	+5 DDs (DD006, DD019, DD022, DD023, DD026, DD014 Phase 2)
Phase 3	6 months	Oct 2026-Mar 2027	514 (add 63 pharynx + 20 intestine + 28 egg-laying)	+4 DDs (DD007, DD009, DD018, DD017)
Phase 4	6 months	Apr-Sep 2027	959 (all somatic cells)	+2 DDs (DD004, DD014.2, DD014 Phase 3)
TOTAL	~18 months	Mar 2026 - Sep 2027	959 cells	23 DDs implemented

Phases 5-7: Year 3+ (intracellular, developmental, male-specific)

Frequently Asked Questions¶

Q: Why is Phase A1 first if it's infrastructure, not science? A: Without the config system (DD013) and automated validation (DD021), contributors can't test their work efficiently. Better to invest 2 weeks in infrastructure that enables the next 18 months of science, than to implement science DDs without the tools to validate them. Phase A2 (governance, derisking) runs in parallel and doesn't block modeling.

Q: Can Phase 3 organ DDs (DD007, DD009, DD018) be implemented in parallel? A: Yes — they're semi-independent subsystems. Different contributors can work on pharynx, intestine, and egg-laying simultaneously. DD017 (hybrid ML) can also proceed in parallel.

Q: Why is DD004 (Cell Identity) in Phase 4, not earlier? A: DD004 requires per-cell mechanical properties (elasticity, adhesion) that are informed by organ system behavior. Better to implement organs first (Phase 3), observe their mechanics, then add cell-specific properties in Phase 4. DD004 is also needed for DD014.2 mesh deformation.

Q: What if Phase 1 DD005 fails validation (Tier 2 doesn't improve)? A: The calibration approach (expression→conductance scaling) is uncertain. If it fails, fall back to DD025 (foundation model→params) or manual curation. DD005's scientific risk is why it's Phase 1 — validate the approach early before building more on top of it.

Q: When do we write papers? A: After each major milestone:

Phase 1: "CeNGEN-Parameterized Neural Circuit" (target: eNeuro or Frontiers in Neuroinformatics)
Phase 2: "Closed-Loop Sensorimotor Behavior in Whole-Organism Simulation" (target: PLoS Computational Biology)
Phase 3: "Multi-Organ, Multi-Timescale C. elegans Simulation" (target: Nature Communications or Cell Systems)
Phase 4: "Complete 959-Cell Digital Organism" (target: Nature or Science)

Q: Why is DD025 (foundation model kinetics) in Phase A2, not Phase 3 with the rest of DD017? A: Component 3 derisks DD005's uncertain transcript→conductance mapping. BioEmu-1 (100,000x MD speed) invalidated the original "computationally expensive" rejection. The inputs (WormBase sequences, literature kinetics) are available now with no infrastructure dependencies. Cross-validation in Phase A2 provides a safety net: if DD005's naive mapping fails in Phase 1, structure-based predictions are ready immediately.

Q: What's the difference between Phase A1 and Phase A2? A: Phase A1 (Core Infrastructure) contains the 5 DDs that block all subsequent modeling work — containerization, data access, validation toolbox, baseline datasets, and project dashboard. Without them, no contributor can build, test, or validate. Phase A2 (Governance & Derisking) contains 4 DDs that enable scaling and derisk Phase 1 — contributor progression, RFC process, AI agent workflow, and foundation model kinetics. A2 can run entirely in parallel with A1 and doesn't block Phase 1.

Approved by: Pending (awaiting founder review)
Maintained by: Integration L4 Maintainer (when appointed)
Next Review: After Phase A1 completion (reassess timeline based on actual progress)