DD020: Connectome Data Access and Dataset Policy¶
- Status: Accepted
- Author: OpenWorm Core Team
- Date: 2026-02-16
- Supersedes: None
- Related: DD001 (Neural Circuit), DD002 (Muscle Model), DD005 (Cell-Type Specialization), DD006 (Neuropeptidergic Connectome), DD007 (Pharyngeal System), DD008 (Data Integration Pipeline), DD013 (Simulation Stack), DD017 (Hybrid Mechanistic-ML), DD019 (Closed-Loop Touch Response)
Phase: Phase 0: Existing Foundation | Layer: Data Infrastructure
TL;DR¶
The ConnectomeToolbox (cect, PyPI v0.2.7) is OpenWorm's canonical package for accessing C. elegans connectome data. It provides 30+ dataset readers spanning 1976-2024 (White, Varshney, Cook, Witvliet developmental series, Randi functional, Ripoll-Sanchez neuropeptidergic, Wang 2024 neurotransmitter atlas), cell classification, neurotransmitter identity, and bilateral symmetry analysis. This DD specifies: (1) the default dataset for each modeling use case, (2) version pinning policy, (3) canonical API patterns all consuming DDs must follow, and (4) multi-dataset validation strategy. Never parse raw CSV/Excel connectome files directly — always use cect.
Quick Action Reference¶
| Question | Answer |
|---|---|
| Phase | Phase 0 |
| Layer | Core Architecture — see Phase Roadmap |
| What does this produce? | Standardized connectome data access via cect Python API: adjacency matrices, ConnectionInfo objects, cell classification, neurotransmitter identities, bilateral symmetry metrics |
| Success metric | All consuming DDs (DD001, DD002, DD005, DD006, DD007, DD017, DD019) obtain connectome data exclusively through cect; dataset selection is explicit and reproducible |
| Repository | openworm/ConnectomeToolbox — issues labeled dd020 |
| Config toggle | data.connectome.dataset: "Cook2019Herm" / data.connectome.cect_version: "0.2.7" in openworm.yml |
| Build & test | pip install cect==0.2.7 then python -c "from cect.Cook2019HermReader import get_instance; cds = get_instance(); cds.summary()" |
| Visualize | cect built-in: cds.to_plotly_matrix_fig(synclass, view), cds.to_plotly_graph_fig(synclass, view), cds.to_plotly_hive_plot_fig(synclass, view) |
| CI gate | cect import + default dataset query returns expected neuron count; version matches versions.lock |
| --- |
Goal & Success Criteria¶
| Criterion | Target | DD010 Tier |
|---|---|---|
| Primary: Unified data access | All 9 consuming DDs obtain connectome data via cect API, not raw file parsing |
Tier 1 (blocking) |
| Secondary: Reproducibility | Dataset selection and cect version pinned in openworm.yml + versions.lock; any two runs with same config produce identical adjacency matrices |
Tier 1 (blocking) |
| Tertiary: Multi-dataset validation | Simulation results compared against ≥2 independent connectome datasets (e.g., Cook2019Herm primary, Witvliet8 cross-validation) | Tier 2 (non-blocking initially, blocking Phase 3+) |
Before: Each consuming DD independently decides which connectome dataset to use, how to parse it, and how to handle cell name variants. DD001 uses UpdatedSpreadsheetDataReader2, DD006 uses Ripoll-Sanchez data, DD007 may use Cook2019 pharyngeal subset — no coordination, no version pinning, no comparison.
After: A single DD (this one) specifies dataset selection, version pinning, API contract, and validation strategy. Consuming DDs reference DD020 for connectome access. Changes to default dataset or cect version are reviewed centrally.
Deliverables¶
| Artifact | Path / Location | Format | Example |
|---|---|---|---|
cect package (external) |
PyPI: pip install cect / GitHub: openworm/ConnectomeToolbox |
Python package | from cect.Cook2019HermReader import get_instance |
| Dataset selection policy | This DD (Section: Technical Approach) | Markdown specification | "Default adult hermaphrodite: Cook2019Herm" |
openworm.yml connectome config |
data.connectome.* keys |
YAML | dataset: "Cook2019Herm", cect_version: "0.2.7" |
versions.lock entry |
cect key in versions.lock (DD013) |
Lock file | cect: "0.2.7" |
| Connectome adjacency matrices | In-memory via cect API |
numpy.ndarray per synclass |
cds.connections["Generic_CS"] shape (N, N) |
| Cell classification metadata | cect.Cells module |
Python API | get_SIM_class("AVAL") → "Interneuron" |
| Bilateral symmetry metrics | cect.Analysis.convert_to_symmetry_array() |
(ndarray, float, str) |
(array, 56.25, "Of 441 possible edges...") |
| Neurotransmitter identity | cect.Neurotransmitters module |
Python constants | ACETYLCHOLINE, GABA, GLUTAMATE, SEROTONIN |
Repository & Issues¶
| Item | Value |
|---|---|
| Repository | openworm/ConnectomeToolbox |
| Issue label | dd020 |
| Milestone | Connectome Data Access Policy |
| Branch convention | dd020/description (e.g., dd020/pin-cook2019-default) |
| Example PR title | DD020: Pin cect 0.2.7 in versions.lock, update openworm.yml |
| De facto maintainer | Neural Circuit L4 Maintainer |
| PyPI | https://pypi.org/project/cect/ |
How to Build & Test¶
Prerequisites¶
- Python 3.8+ (tested on 3.8-3.12)
- pip
Getting Started (Environment Setup)¶
To use cect (most contributors): Install from PyPI — no repository clone needed.
pip install cect==0.2.7
Then follow the verification steps below.
To develop cect itself (adding dataset readers, fixing bugs):
git clone https://github.com/openworm/ConnectomeToolbox.git
cd ConnectomeToolbox
pip install -e . # editable install for development
pip install pytest # for running tests
pytest # verify all tests pass
Step-by-step¶
# Step 1: Install cect at pinned version
pip install cect==0.2.7
# Step 2: Verify default dataset loads
python -c "
from cect.Cook2019HermReader import get_instance
cds = get_instance()
cds.summary()
# Expected: ~302 neurons, chemical + electrical connections
"
# Step 3: Verify ConnectionInfo API
python -c "
from cect.Cook2019HermReader import get_instance
cds = get_instance()
for ci in cds.original_connection_infos[:5]:
print(f'{ci.pre_cell} -> {ci.post_cell}: {ci.number} ({ci.syntype}, {ci.synclass})')
"
# Step 4: Verify cell classification
python -c "
from cect.Cells import get_SIM_class, SENSORY_NEURONS_COOK, INTERNEURONS_COOK, MOTORNEURONS_COOK
print(f'Sensory: {len(SENSORY_NEURONS_COOK)}')
print(f'Interneurons: {len(INTERNEURONS_COOK)}')
print(f'Motor: {len(MOTORNEURONS_COOK)}')
print(f'AVAL class: {get_SIM_class(\"AVAL\")}')
print(f'DA01 class: {get_SIM_class(\"DA01\")}')
"
# Step 5: Verify bilateral symmetry analysis
python -c "
from cect.Cook2019HermReader import get_instance
from cect.Analysis import convert_to_symmetry_array
cds = get_instance()
arr, pct, info = convert_to_symmetry_array(cds, ['Generic_CS'])
print(f'Symmetry: {pct:.1f}%')
print(info)
"
# Step 6: Docker-based verification ([DD013](DD013_Simulation_Stack_Architecture.md) stack)
docker compose run shell python -c "import cect; print(cect.__version__)"
How to Visualize¶
cect built-in visualization: The package generates publication-quality Plotly figures directly.
| Viewer Feature | Specification |
|---|---|
| Matrix view | cds.to_plotly_matrix_fig(synclass="Generic_CS", view="Neurons") — heatmap of connectivity matrix |
| Graph view | cds.to_plotly_graph_fig(synclass="Generic_CS", view="Neurons") — network graph |
| Hive plot | cds.to_plotly_hive_plot_fig(synclass="Generic_CS", view="Neurons") — hive plot visualization |
| Symmetry view | cds.to_plotly_matrix_fig(synclass="Generic_CS", view="Neurons", symmetry=True) — bilateral symmetry overlay (red=asymmetric, blue=symmetric) |
| Filtering | Views: "Neurons", "Full", "Pharynx", "MotorMuscle", custom via get_connectome_view() |
DD014 viewer connection: Connectome topology exported as part of neural/ OME-Zarr group. The DD014 viewer reads connectivity from the simulation output, not directly from cect — but cect is the upstream source of truth.
Technical Approach¶
Default Datasets by Use Case¶
| Use Case | Default Dataset | cect Reader |
Rationale |
|---|---|---|---|
| Adult hermaphrodite somatic connectome | Cook et al. 2019 (hermaphrodite) | Cook2019HermReader |
Gold standard whole-animal EM reconstruction; corrects White 1986 errors; includes both chemical and electrical synapses |
| Adult male connectome | Cook et al. 2019 (male) | Cook2019MaleReader |
Only complete male connectome; 385 neurons including male-specific |
| Pharyngeal nervous system | Cook et al. 2019 (pharyngeal subset) | Cook2019HermReader with pharynx view filter |
Same dataset, filtered via cds.get_connectome_view("Pharynx") |
| Developmental series | Witvliet et al. 2021 (stages 1-8) | WitvlietDataReader1 through WitvlietDataReader8 |
Only developmental connectome series; L1 through adult |
| Functional connectivity validation | Randi et al. 2023 | WormNeuroAtlasFuncReader |
Whole-brain calcium imaging functional connectivity; primary validation target for DD005 |
| Neuropeptidergic network | Ripoll-Sanchez et al. 2023 | RipollSanchezShortRangeReader, RipollSanchezMidRangeReader, RipollSanchezLongRangeReader |
Extrasynaptic neuropeptide signaling; DD006 primary data source |
| Neurotransmitter identity | Wang et al. 2024 | Wang2024HermReader, Wang2024MaleReader |
CRISPR/Cas9 fluorescent reporter atlas; 16 neurotransmitter systems; reveals co-transmission |
| OpenWorm unified (experimental) | Wang 2024 hermaphrodite base | OpenWormUnifiedReader |
WIP — subject to change; currently wraps Wang2024Reader with electrical connections. Use for forward-looking development, not production simulations |
| Legacy / backward compatibility | Varshney et al. 2011 | VarshneyDataReader |
Historical dataset; use only for comparing to pre-2019 publications |
| Cross-validation | Witvliet stage 8 (adult) | WitvlietDataReader8 |
Independent adult reconstruction for comparing against Cook2019Herm |
Dataset Selection Rules¶
-
Default to Cook2019Herm unless your DD explicitly requires a different dataset. Cook2019 corrects the ~150 errors found in White 1986 and includes both sexes.
-
Never parse raw files. All connectome data access MUST go through
cectreaders. The readers handle:- Cell name normalization (e.g.,
DB1/3→DB1,DB3) - Synapse type classification
- Consistent
ConnectionInfoformat - Caching for fast repeated access
- Cell name normalization (e.g.,
-
Specify the dataset explicitly. Do not rely on
cectdefaults. Every consuming DD must name the dataset in its configuration or code:# GOOD: Explicit dataset from cect.Cook2019HermReader import get_instance cds = get_instance() # BAD: Implicit/unnamed dataset from cect import load_some_connectome # What dataset is this? -
Use
from_cache=Truein CI and Docker. Reader instances can load from pre-cached JSON files for reproducibility and speed:cds = get_instance(from_cache=True) # Fast, deterministic -
Record the
cectversion inversions.lock(DD013). Dataset reader behavior may change across versions.
Policy for Adopting New Datasets¶
When a new connectome dataset is published (e.g., a future revision or new species):
- The ConnectomeToolbox maintainer adds a reader to ConnectomeToolbox (typically within days of publication)
- Bump
cectversion inversions.lockafter reviewing the changelog - Run regression tests — ensure existing simulations produce equivalent results with the new version
- Do NOT change the default dataset without an RFC (DD012 process). Changing from Cook2019Herm to a new default affects all consuming DDs
- New datasets can be used for cross-validation without changing defaults — add them as comparison targets in DD010
API Contract for Consumers¶
Core Data Types¶
# ConnectionInfo — single synaptic connection
from cect.ConnectomeReader import ConnectionInfo
ci = ConnectionInfo(
pre_cell="AVAL", # Presynaptic cell name (str)
post_cell="DA01", # Postsynaptic cell name (str)
number=3.0, # Connection weight/count (float)
syntype="Chemical", # "Chemical" or "Electrical" (str)
synclass="Acetylcholine" # Neurotransmitter class (str)
)
# ConnectomeDataset — full connectome
from cect.ConnectomeDataset import ConnectomeDataset
cds: ConnectomeDataset
cds.nodes # list[str] — all cell names
cds.connections # dict[str, np.ndarray] — synclass → adjacency matrix
cds.original_connection_infos # list[ConnectionInfo] — raw connection list
Canonical Query Patterns¶
Pattern 1: Load the default adult hermaphrodite connectome
from cect.Cook2019HermReader import get_instance
cds = get_instance(from_cache=True)
print(f"Nodes: {len(cds.nodes)}")
print(f"Connections: {len(cds.original_connection_infos)}")
Pattern 2: Get neuron-to-neuron connections only
nn_conns = cds.get_neuron_to_neuron_conns()
for ci in nn_conns:
print(f"{ci.pre_cell} -> {ci.post_cell}: {ci.number} ({ci.synclass})")
Pattern 3: Get neuron-to-muscle connections (for DD002)
nm_conns = cds.get_neuron_to_muscle_conns()
Pattern 4: Filter to pharyngeal view (for DD007)
pharynx_cds = cds.get_connectome_view("Pharynx")
pharynx_cds.summary()
Pattern 5: Convert to NetworkX graph (for graph analysis)
G = cds.to_networkx_graph(synclass="Generic_CS", view="Neurons")
print(f"Graph: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")
Pattern 6: Cell classification lookup
from cect.Cells import (
get_SIM_class, # "Sensory" / "Interneuron" / "Motorneuron" / "Other"
is_any_neuron, # bool
is_known_muscle, # bool
is_pharyngeal_cell, # bool
get_cell_notes, # descriptive string
get_standard_color, # hex color for visualization
SENSORY_NEURONS_COOK, # list of sensory neuron names
INTERNEURONS_COOK, # list of interneuron names
MOTORNEURONS_COOK, # list of motor neuron names
BODY_WALL_MUSCLE_NAMES,
PHARYNGEAL_MUSCLE_NAMES,
)
Pattern 7: Bilateral symmetry analysis
from cect.Cells import (
is_bilateral_left,
is_bilateral_right,
are_bilateral_pair,
get_contralateral_cell,
)
from cect.Analysis import convert_to_symmetry_array
# Check if cells are bilateral pair
assert are_bilateral_pair("AVAL", "AVAR")
assert get_contralateral_cell("AVAL") == "AVAR"
# Compute symmetry metric for a connectome
arr, symmetry_pct, info = convert_to_symmetry_array(cds, ["Generic_CS"])
print(f"Bilateral symmetry: {symmetry_pct:.1f}%")
Pattern 8: Neurotransmitter-specific queries (for DD006)
from cect.Neurotransmitters import (
ACETYLCHOLINE, GABA, GLUTAMATE, SEROTONIN, DOPAMINE,
GENERIC_CHEM_SYN, GENERIC_ELEC_SYN,
PEPTIDERGIC_SYN_CLASS,
)
# Get acetylcholine-specific connectivity matrix
ach_matrix = cds.connections.get(ACETYLCHOLINE)
# Get neuropeptidergic connections (Ripoll-Sanchez data)
from cect.RipollSanchezShortRangeReader import get_instance as get_rs_short
rs_cds = get_rs_short(from_cache=True)
Pattern 9: Wang 2024 neurotransmitter atlas
from cect.Wang2024HermReader import get_instance as get_wang2024
wang_cds = get_wang2024(from_cache=True)
# Access 16 neurotransmitter systems with co-transmission data
wang_cds.summary()
Cell ID Normalization¶
cect handles cell name variants internally. Consumers should use the names as returned by cect and never apply their own normalization. Key conventions:
| Convention | Example | Notes |
|---|---|---|
| Neuron names | AVAL, AVAR, DA01 |
Uppercase, left/right suffix L/R |
| Body wall muscles | MDL01, MVR24 |
Dorsal/ventral, left/right, numbered |
| Pharyngeal muscles | pm1DL, pm3VR |
Lowercase prefix |
| Pharyngeal neurons | I1L, M3L, NSML |
Mixed conventions |
Rule: Do not build cell name parsers. Use cect.Cells functions:
from cect.Cells import is_known_cell, is_any_neuron, is_known_muscle
assert is_known_cell("AVAL")
assert is_any_neuron("AVAL")
assert is_known_muscle("MDL01")
Version Pinning & Update Procedure¶
Pinning¶
# versions.lock ([DD013](DD013_Simulation_Stack_Architecture.md))
cect: "0.2.7" # ConnectomeToolbox — pinned per [DD020](DD020_Connectome_Data_Access_and_Dataset_Policy.md)
# openworm.yml
data:
connectome:
dataset: "Cook2019Herm" # Default dataset for this simulation run
cect_version: "0.2.7" # Must match versions.lock
pharyngeal_dataset: "Cook2019Herm" # Pharynx subset (view filter)
developmental_dataset: null # Set to "Witvliet1" through "Witvliet8" for developmental
functional_dataset: "Randi2023" # Functional connectivity validation target
neuropeptide_dataset: "RipollSanchezShortRange" # [DD006](DD006_Neuropeptidergic_Connectome_Integration.md) neuropeptide data
use_cache: true # Load from cached JSON (recommended for CI)
Update Procedure¶
When a new cect version is released:
- Review the changelog on GitHub/PyPI for breaking changes
- Install new version locally:
pip install cect==X.Y.Z - Run dataset load test: Verify default dataset returns same neuron count and connection count as previous version
- Run full simulation regression:
docker compose run validatewith newcectversion - If regression passes: Update
versions.lockandopenworm.yml, open PR referencing DD020 - If regression fails: Investigate which dataset reader changed, report issue on
openworm/ConnectomeToolbox
Breaking change policy: If a cect update changes the default dataset's adjacency matrix (e.g., corrected connection counts), this is a simulation-affecting change requiring:
- Re-running all DD010 validation tiers
- Documenting the delta (which connections changed, why)
- Approval from the Integration Maintainer (DD013)
Multi-Dataset Validation¶
Strategy¶
Biological connectome data are noisy — different labs, different animals, different EM reconstruction methods yield different connection counts. A robust simulation should produce qualitatively similar behavior across multiple connectome datasets.
Validation Protocol¶
| Validation Level | Datasets | What to Compare | Acceptance Criterion |
|---|---|---|---|
| Primary | Cook2019Herm (default) | Full simulation kinematic metrics | DD010 Tier 3: ±15% of Schafer lab data |
| Cross-validation | Witvliet8 (independent adult) | Same simulation, different connectome | Locomotion pattern preserved (forward crawling); speed within ±30% of Cook2019 result |
| Sensitivity analysis | Cook2019 + Varshney2011 | Compare which connections are critical | Identify connections present in both datasets that most affect behavior |
| Bilateral symmetry | All datasets | convert_to_symmetry_array() metric |
Datasets should show >40% bilateral symmetry for chemical synapses (biological expectation) |
Bilateral Symmetry as a Validation Metric¶
The worm's nervous system is largely bilaterally symmetric — left and right neuron pairs (e.g., AVAL/AVAR) receive similar inputs and produce similar outputs. cect v0.2.7 includes bilateral symmetry analysis (added Feb 2026):
from cect.Analysis import convert_to_symmetry_array
# Compute symmetry for each dataset
for dataset_name, reader_module in [("Cook2019Herm", "Cook2019HermReader"), ...]:
cds = get_instance(from_cache=True)
arr, pct, info = convert_to_symmetry_array(cds, ["Generic_CS"])
print(f"{dataset_name}: {pct:.1f}% bilateral symmetry")
Use case: If the simulation produces asymmetric behavior (e.g., the worm always turns left), compare against the bilateral symmetry metric. A highly symmetric connectome should produce roughly symmetric behavior.
Developmental Connectome Support¶
Witvliet Developmental Series¶
Witvliet et al. (2021) published 8 connectome reconstructions spanning L1 larval stage through adult:
| Stage | Reader | Neurons | Notes |
|---|---|---|---|
| Witvliet 1 | WitvlietDataReader1 |
~180 | L1 early |
| Witvliet 2 | WitvlietDataReader2 |
~180 | L1 |
| Witvliet 3 | WitvlietDataReader3 |
~180 | L1 late |
| Witvliet 4 | WitvlietDataReader4 |
~240 | L2 |
| Witvliet 5 | WitvlietDataReader5 |
~270 | L3 |
| Witvliet 6 | WitvlietDataReader6 |
~270 | L3/L4 |
| Witvliet 7 | WitvlietDataReader7 |
~280 | L4 |
| Witvliet 8 | WitvlietDataReader8 |
~300 | Adult |
Interaction with DD001¶
DD001 assumes a single adult hermaphrodite connectome (302 neurons). Developmental connectome support requires:
- Neuron birth/death: Not all 302 neurons exist at all stages.
cecthandles this — each stage reader returns only the neurons present at that stage. - Connection strength changes: Synapse counts change during development. The
numberfield inConnectionInforeflects the stage-specific count. - Simulation implication: To simulate a developmental stage, set
data.connectome.dataset: "WitvlietN"inopenworm.yml. DD001's c302 framework handles variable neuron counts. - Validation caveat: DD010 kinematic benchmarks are from adult worms. Developmental stage simulations require stage-specific behavioral data for validation (limited availability).
Current recommendation: Use Witvliet stages for cross-validation and sensitivity analysis, not as primary simulation input. DD001's default remains Cook2019Herm (adult).
Per-Synapse Spatial Position Data¶
The Witvliet et al. 2021 EM reconstructions include synapse centroid coordinates along neurites, enabling spatially resolved synapse placement for multicompartmental neuron models (DD001 Level D). Zhao et al. (2024) demonstrated that the distribution of synapse-to-soma distances along neurites is well-described by an inverse Gaussian distribution, fitted from these EM measurements. The ConnectomeToolbox WitvlietDataReader series may need extension to expose per-synapse position data, as current readers provide only neuron-level adjacency matrices.
Alternatives Considered¶
1. Use OWMeta (DD008) as the Sole Connectome Data Access Layer¶
Description: Route all connectome queries through OWMeta's RDF knowledge graph.
Deferred (not rejected) because:
- OWMeta wraps connectome data in a semantic layer (RDF triples, SPARQL-like queries) that adds complexity without clear benefit for adjacency matrix access
- OWMeta's
owmetapackage is dormant (last real commit Jul 2024);cectis actively maintained (commits within days) cectprovides direct Python API access to 30+ datasets; OWMeta has ingestion scripts for fewer datasetscectalready has caching, cell classification, neurotransmitter identity, and visualization built in
When to reconsider: In Phase 3+, when OWMeta integrates CeNGEN, WormAtlas, and other non-connectome data sources, cect data should flow through OWMeta to provide a unified semantic layer. At that point, DD008's OWMeta would call cect internally, and consuming DDs could use either API. See "Relationship to DD008" below.
2. Parse Raw Published Data Files Directly¶
Description: Each consuming DD downloads CSV/Excel files from journal supplementary materials and parses them with custom scripts.
Rejected because:
- Cell name inconsistencies across publications (Cook uses "AVAL", some papers use "AVA(L)", White uses different conventions)
- No shared cell classification or neurotransmitter identity
- Each DD reinvents parsing logic; bugs are not shared/fixed centrally
- No version pinning; source URLs can change
cectalready solves all these problems
3. Maintain a Static Adjacency Matrix in the OpenWorm Meta-Repo¶
Description: Export Cook2019Herm as a static CSV/NumPy file, commit to openworm/OpenWorm, and have all DDs read from that file.
Rejected because:
- Loses dataset metadata (neurotransmitter identity, synapse types, cell classification)
- Cannot easily switch datasets for cross-validation
- Must manually update when corrections are published
- Cannot leverage
cect's growing dataset collection
4. Build a Custom Connectome Access Library¶
Description: Write an OpenWorm-specific connectome library separate from cect.
Rejected because:
cectalready exists, is actively maintained by the ConnectomeToolbox maintainer, and is approaching preprint publication- Duplicating effort;
cecthas ~4 years of development and 30+ dataset readers - OpenWorm should contribute to
cect, not compete with it
5. Default to OpenWormUnifiedReader Instead of Cook2019Herm¶
Description: Use the "unified" reader as the default for all simulations.
Deferred because:
- OpenWormUnifiedReader is explicitly marked "WIP — subject to change without notice" (as of Feb 2026)
- Currently wraps Wang2024Reader, which is based on neurotransmitter reporter data, not EM reconstruction — different methodology than Cook2019
- The unified reader's connection counts and topology may change as it evolves
- Use for forward-looking development, not production simulations
When to reconsider: When OpenWormUnifiedReader stabilizes and the ConnectomeToolbox maintainer removes the WIP designation.
Quality Criteria¶
What Defines Valid Connectome Data Access?¶
-
API-only access: All connectome data loaded via
cectreader'sget_instance()function. No raw file parsing. -
Explicit dataset naming: Every consuming DD's code or config must name the dataset reader (e.g.,
Cook2019HermReader). No unnamed/default datasets. -
Version pinning:
cectversion pinned inversions.lock.openworm.ymlrecords dataset name. -
Cell name consistency: Use cell names as returned by
cect. Do not apply custom normalization or renaming. -
Reproducible queries: Running the same query with the same
cectversion andfrom_cache=Truemust return identical results. -
Cache usage in CI: CI builds use
from_cache=Trueto ensure deterministic results independent of external data availability.
Boundaries (Explicitly Out of Scope)¶
What This Design Document Does NOT Cover:¶
-
OWMeta/RDF knowledge graph: DD008 owns the semantic data layer. This DD owns the direct Python API layer (
cect). See "Relationship to DD008" for the boundary. -
Neural circuit modeling: DD001 owns how connectome topology is translated into NeuroML network files. This DD provides the topology; DD001 consumes it.
-
Neuropeptidergic modeling: DD006 owns the biological interpretation of Ripoll-Sanchez neuropeptide data. This DD provides access to the data.
-
Cell-type specialization: DD005 owns CeNGEN expression-to-conductance mapping. This DD does not cover transcriptomic data access.
-
Pharyngeal system modeling: DD007 owns the pharyngeal oscillator model. This DD provides the pharyngeal subset of the connectome.
-
Visualization rendering: DD014/DD014.1 own the viewer. This DD's visualization is for data exploration via
cect's built-in Plotly figures, not for the simulation viewer. -
Individual connectome variation: Natural genetic variation affecting synaptic connectivity is out of scope. This DD uses population-representative published reconstructions.
Context & Background¶
ConnectomeToolbox History¶
The C. elegans connectome — the complete wiring diagram of the nervous system — has been the foundation of the OpenWorm project since its inception. The original dataset (White et al. 1986) was manually compiled from electron micrographs. Over four decades, multiple groups have re-analyzed, corrected, and extended this data:
| Year | Dataset | Key Contribution | cect Reader |
|---|---|---|---|
| 1986 | White et al. | Original 302-neuron hermaphrodite connectome | WhiteDataReader |
| 2011 | Varshney et al. | Digital re-analysis of White's EM data | VarshneyDataReader |
| 2016 | Bentley et al. | Monoamine network mapping | Via WormNeuroAtlas |
| 2019 | Cook et al. | Whole-animal EM reconstruction, both sexes, corrects ~150 errors | Cook2019HermReader, Cook2019MaleReader |
| 2020 | Cook et al. | Updated analysis | Cook2020DataReader |
| 2021 | Witvliet et al. | 8-stage developmental series (L1 through adult) | WitvlietDataReader1-8 |
| 2021 | Brittin et al. | Contact area-based adjacency | BrittinDataReader |
| 2023 | Randi et al. | Whole-brain calcium imaging functional connectivity | WormNeuroAtlasFuncReader |
| 2023 | Ripoll-Sanchez et al. | Neuropeptide-receptor network (31,479 interactions) | RipollSanchezShortRangeReader etc. |
| 2024 | Wang et al. | Neurotransmitter atlas (16 systems, CRISPR/Cas9 reporters) | Wang2024HermReader, Wang2024MaleReader |
| 2024 | Yim et al. | Updated connectivity analysis | Yim2024DataReader |
The ConnectomeToolbox (cect) was created to provide unified access to all these datasets through a consistent Python API. As of v0.2.7 (Feb 2026), it includes 30+ dataset readers, cell classification, neurotransmitter identity, bilateral symmetry analysis, and multiple visualization modes.
Recent Activity (Feb 2026)¶
- Feb 10: Bilateral symmetry analysis notebook added
- Feb 10: OpenWormUnifiedReader switched to Wang2024Reader base
- Feb 11: Electrical connections added to OpenWormUnifiedReader
- Feb 12 meeting: The ConnectomeToolbox maintainer reported the preprint is "nearly finished"
- Feb 13: Merge of latest updates to main branch
Why This DD Is Needed Now¶
ConnectomeToolbox is already referenced as a dependency in 9 existing DDs (DD001, DD002, DD005, DD006, DD007, DD008, DD013, DD017, DD019), yet no DD specifies how it should be used, which dataset to default to, or how to pin versions. Key risks without DD020:
- Dataset drift: DD001 uses
Cook2019Hermwhile DD006 usesRipollSanchezShortRange— both valid, but no policy coordinates them - Version skew: One consumer pins
cect==0.2.5, another installs latest; connection counts differ silently - API inconsistency: Some consumers parse
original_connection_infos, others useconnectionsmatrices; no canonical pattern - Update risk:
cectupdates break simulations because no regression testing policy exists
Relationship to DD008 (OWMeta)¶
Clear boundary: cect and OWMeta serve different purposes and should coexist:
| Aspect | cect (DD020) |
OWMeta (DD008) |
|---|---|---|
| Architecture | Direct Python API | RDF semantic graph |
| Query style | get_instance() → ConnectomeDataset |
connect("openworm_data") → SPARQL-like |
| Data scope | Connectome topology only | Connectome + CeNGEN + WormAtlas + lineage + ... |
| Maintainer | ConnectomeToolbox maintainer (active) | OWMeta team (dormant since Jul 2024) |
| Current status | v0.2.7, 30+ datasets, preprint pending | Working but under-maintained |
| Best for | Direct adjacency matrix access, visualization, cross-dataset comparison | Unified multi-modal biological queries, provenance tracking |
Phase 1-2 (now): Use cect directly. It's actively maintained, has the datasets we need, and provides the API patterns consuming DDs require.
Phase 3+ (future): When OWMeta becomes active again, it should call cect internally as its connectome data provider. Consuming DDs could then use either cect (direct) or OWMeta (semantic) depending on their needs. DD008 should add a cect ingestion adapter:
# Future [DD008](DD008_Data_Integration_Pipeline.md) integration (Phase 3+)
# OWMeta calls cect internally
from owmeta_core import connect
conn = connect("openworm_data")
# Under the hood: OWMeta uses cect.Cook2019HermReader
neurons = list(conn.query(Neuron)())
Configuration¶
openworm.yml Section¶
data:
connectome:
# Primary structural connectome
dataset: "Cook2019Herm" # Required. Reader name from cect.
cect_version: "0.2.7" # Required. Must match versions.lock.
use_cache: true # Recommended. Use cached JSON for speed/reproducibility.
# Pharyngeal subset ([DD007](DD007_Pharyngeal_System_Architecture.md))
pharyngeal_dataset: "Cook2019Herm" # Same dataset, different view filter
pharyngeal_view: "Pharynx" # View filter name
# Developmental (optional, null = disabled)
developmental_dataset: null # "Witvliet1" through "Witvliet8"
# Functional connectivity validation target ([DD005](DD005_Cell_Type_Differentiation_Strategy.md), [DD010](DD010_Validation_Framework.md))
functional_dataset: "Randi2023" # WormNeuroAtlasFuncReader
# Neuropeptidergic ([DD006](DD006_Neuropeptidergic_Connectome_Integration.md))
neuropeptide_dataset: "RipollSanchezShortRange"
# Neurotransmitter identity ([DD006](DD006_Neuropeptidergic_Connectome_Integration.md), experimental)
neurotransmitter_dataset: "Wang2024Herm"
# Cross-validation dataset ([DD010](DD010_Validation_Framework.md))
cross_validation_dataset: "Witvliet8"
| Key | Default | Valid Values | Description |
|---|---|---|---|
data.connectome.dataset |
"Cook2019Herm" |
Any cect reader name |
Primary structural connectome dataset |
data.connectome.cect_version |
"0.2.7" |
Semver string | cect package version pin |
data.connectome.use_cache |
true |
true/false |
Load from cached JSON (recommended) |
data.connectome.pharyngeal_dataset |
"Cook2019Herm" |
Reader name | Dataset for pharyngeal subset |
data.connectome.pharyngeal_view |
"Pharynx" |
View filter name | cect view filter for pharynx |
data.connectome.developmental_dataset |
null |
"Witvliet1" - "Witvliet8" / null |
Developmental series dataset |
data.connectome.functional_dataset |
"Randi2023" |
Reader name | Functional connectivity target |
data.connectome.neuropeptide_dataset |
"RipollSanchezShortRange" |
Ripoll-Sanchez reader name | Neuropeptide network dataset |
data.connectome.neurotransmitter_dataset |
"Wang2024Herm" |
Wang2024 reader name | Neurotransmitter identity dataset |
data.connectome.cross_validation_dataset |
"Witvliet8" |
Reader name | Independent dataset for cross-validation |
Integration Contract¶
Inputs / Outputs¶
Inputs (What This Subsystem Consumes)
| Input | Source | Variable | Format | Units |
|---|---|---|---|---|
| Published connectome datasets (external) | Journal supplementary materials | EM reconstructions, neurotransmitter atlases | Excel, CSV, TSV | Connection counts |
openworm.yml connectome config |
DD013 config system | Dataset selection, version pin | YAML | config keys |
versions.lock cect version |
DD013 build system | cect package version |
Lock file | semver |
Outputs (What This Subsystem Produces)
| Output | Consumer DD | Variable | Format | Units |
|---|---|---|---|---|
| Structural adjacency matrices (chemical + electrical) | DD001 (neural circuit topology) | ConnectomeDataset.connections dict |
dict[str, np.ndarray] |
Connection counts |
ConnectionInfo list |
DD001, DD002 (neuron→muscle mapping) | ConnectomeDataset.original_connection_infos |
list[ConnectionInfo] |
pre_cell, post_cell, number, syntype, synclass |
| Neuron-to-neuron connections | DD001 (synapse generation) | get_neuron_to_neuron_conns() |
list[ConnectionInfo] |
Connection counts |
| Neuron-to-muscle connections | DD002 (NMJ coupling) | get_neuron_to_muscle_conns() |
list[ConnectionInfo] |
Connection counts |
| Pharyngeal connectome view | DD007 (pharynx circuit) | get_connectome_view("Pharynx") |
ConnectomeDataset (filtered) |
Connection counts |
| Cell classification | DD001, DD005, DD014 (neuron type labeling) | get_SIM_class(), cell lists |
Python API | Category strings |
| Neuropeptide network | DD006 (peptidergic modulation) | Ripoll-Sanchez readers | ConnectomeDataset |
Interaction scores |
| Neurotransmitter identity | DD006 (synapse type assignment) | Wang2024Reader | ConnectionInfo.synclass |
NT names |
| Bilateral symmetry metrics | DD010 (validation), DD017 (ML targets) | convert_to_symmetry_array() |
(ndarray, float, str) |
Percentage |
| Developmental connectome series | DD019 (developmental validation) | Witvliet 1-8 readers | ConnectomeDataset per stage |
Connection counts |
| Functional connectivity matrix | DD005, DD010 (validation target) | WormNeuroAtlasFuncReader |
ConnectomeDataset |
Correlation values |
| NetworkX graph | DD017 (graph neural networks) | to_networkx_graph() |
networkx.DiGraph |
Weighted edges |
Repository & Packaging¶
| Item | Value |
|---|---|
| Repository | openworm/ConnectomeToolbox |
| Docker stage | data (installed via pip in neural/body/pharynx stages) |
versions.lock key |
cect |
| Build dependencies | numpy <2.4, xlrd, openpyxl, wormneuroatlas, networkx, hiveplotlib <=0.25.1, webcolors, pyneuroml |
| Data in image | Cached JSON files from cect/cache/ (~5MB total) |
How to Test (Contributor Workflow)¶
# Per-PR quick test (must pass before submission)
python -c "
from cect.Cook2019HermReader import get_instance
cds = get_instance(from_cache=True)
assert len(cds.nodes) >= 300, f'Expected 300+ nodes, got {len(cds.nodes)}'
assert len(cds.original_connection_infos) > 0, 'No connections loaded'
print('[DD020](DD020_Connectome_Data_Access_and_Dataset_Policy.md) quick test: PASS')
"
# Full validation (must pass before merge)
python -c "
import cect
assert cect.__version__ == '0.2.7', f'Version mismatch: {cect.__version__}'
from cect.Cook2019HermReader import get_instance
cds = get_instance(from_cache=True)
# Verify neuron count
from cect.Cells import SENSORY_NEURONS_COOK, INTERNEURONS_COOK, MOTORNEURONS_COOK
total_neurons = len(SENSORY_NEURONS_COOK) + len(INTERNEURONS_COOK) + len(MOTORNEURONS_COOK)
assert total_neurons > 280, f'Expected 280+ neurons, got {total_neurons}'
# Verify cell classification
from cect.Cells import get_SIM_class
assert get_SIM_class('AVAL') == 'Interneuron'
assert get_SIM_class('DA01') == 'Motorneuron'
# Verify bilateral symmetry works
from cect.Analysis import convert_to_symmetry_array
arr, pct, info = convert_to_symmetry_array(cds, ['Generic_CS'])
assert 0 < pct < 100, f'Symmetry percentage out of range: {pct}'
print('[DD020](DD020_Connectome_Data_Access_and_Dataset_Policy.md) full validation: PASS')
"
How to Visualize (DD014 Connection)¶
| Data Flow | Description |
|---|---|
cect → DD001 → OME-Zarr → DD014 |
Connectome topology flows through c302 into simulation output; DD014 viewer displays neuron connectivity as part of the neural layer |
cect → Plotly (direct) |
cect's built-in to_plotly_* methods for data exploration and publication figures |
Coupling Dependencies¶
| I Depend On | DD | What Breaks If They Change |
|---|---|---|
| Published connectome datasets (external) | None | If Cook et al. publish corrections, cect readers must be updated by the ConnectomeToolbox maintainer |
cect Python package (external) |
None | If cect API changes, all consuming DDs must update imports |
| DD013 config system | DD013 | If openworm.yml schema changes, data.connectome keys must be updated |
| Depends On Me | DD | What Breaks If I Change |
|---|---|---|
| Neural circuit topology | DD001 | Changing default dataset changes every synapse in the simulation |
| Muscle innervation | DD002 | Neuron-to-muscle connection list drives NMJ coupling |
| Cell-type specialization | DD005 | Cook2019 neuron list defines which cells to specialize |
| Neuropeptidergic network | DD006 | Ripoll-Sanchez data defines peptide-receptor interactions |
| Pharyngeal circuit | DD007 | Pharyngeal view filter defines pharynx neuron connectivity |
| Data integration | DD008 | OWMeta ingests connectome data from cect (Phase 3+) |
| Simulation stack | DD013 | cect version pinned in versions.lock |
| Hybrid ML framework | DD017 | Graph structure (via NetworkX) used for graph neural networks |
| Closed-loop touch response | DD019 | Touch neuron connectivity (MEC-4 neurons) from cect |
Known Issues and Future Work¶
Issue 1: OpenWormUnifiedReader Is WIP¶
The "unified" reader is intended to become the single best-estimate connectome, but as of Feb 2026 it wraps Wang2024Reader and is marked "subject to change without notice." Using it as a default would introduce instability.
Mitigation: Default to Cook2019Herm. Monitor OpenWormUnifiedReader stability. Adopt as default only after WIP designation is removed and regression tests pass.
Issue 2: Cell Name Variants Across Datasets¶
Different datasets use slightly different cell naming conventions. cect handles this internally via map_cell_name() methods in each reader, but edge cases may remain.
Mitigation: Always use cell names as returned by cect. Report naming inconsistencies as issues on openworm/ConnectomeToolbox.
Issue 3: Connection Count Discrepancies Between Datasets¶
Cook2019 and Witvliet8 are both adult hermaphrodite connectomes but report different connection counts for some neuron pairs. This is expected (different animals, different EM volumes) but can confuse validation.
Mitigation: Use Cook2019Herm as primary, Witvliet8 as cross-validation. Document expected discrepancy ranges.
Issue 4: Preprint Not Yet Published¶
The ConnectomeToolbox preprint (Gleeson et al., in preparation) is not yet available. Once published, this DD should be updated with the citation and DOI.
Future action: Add citation when preprint appears on bioRxiv.
Existing Code Resources¶
wormneuroatlas (openworm/wormneuroatlas, PyPI: pip install wormneuroatlas, maintained 2025):
Complements cect with additional data not in the connectome: CeNGEN gene expression (NeuroAtlas.get_gene_expression()), Randi 2023 functional connectivity (NeuroAtlas.get_signal_propagation_atlas()), and neuropeptide/GPCR deorphanization (PeptideGPCR class). Consider adding to DD013 Docker and versions.lock.
NemaNode (openworm/NemaNode, 2024, nemanode.org):
Interactive connectome map. Evaluate as cross-validation data source and potential source for per-synapse spatial position data not yet exposed in cect's WitvlietDataReader (needed for DD001 Level D multicompartmental models).
References¶
-
White JG, Southgate E, Thomson JN, Brenner S (1986). "The structure of the nervous system of the nematode Caenorhabditis elegans." Phil Trans R Soc B 314:1-340. Original connectome.
-
Varshney LR, Chen BL, Paniagua E, Hall DH, Chklovskii DB (2011). "Structural properties of the Caenorhabditis elegans neuronal network." PLoS Comput Biol 7:e1001066. Digital re-analysis.
-
Cook SJ, Jarrell TA, Brittin CA, Wang Y, Bloniarz AE, Yaber MA, et al. (2019). "Whole-animal connectomes of both Caenorhabditis elegans sexes." Nature 571:63-71. Gold standard whole-animal EM reconstruction.
-
Witvliet D, Mulcahy B, Mitchell JK, Meiber Y, Anber R, Bhatia A, et al. (2021). "Connectomes across development reveal principles of brain maturation." Nature 596:257-261. Developmental connectome series (8 stages).
-
Randi F, Sharma AK, Dvali N, Leifer AM (2023). "Neural signal propagation atlas of Caenorhabditis elegans." Nature 623:406-414. Functional connectivity.
-
Ripoll-Sanchez L, Watteyne J, Sun H, Fernandez R, Taylor SR, Weinreb A, et al. (2023). "The neuropeptidergic connectome of C. elegans." Neuron 111:3570-3589. Neuropeptide-receptor interaction network.
-
Wang C, et al. (2024). "A neurotransmitter atlas of C. elegans males and hermaphrodites." eLife 13:RP95402. CRISPR/Cas9 neurotransmitter reporter atlas.
-
Gleeson P et al. (in preparation). "ConnectomeToolbox: a unified Python toolkit for C. elegans connectome data."
cectpackage preprint. -
Zhao M et al. (2024). Nat Comp Sci 4:978-990. Per-synapse spatial data from Witvliet EM reconstructions modeled as inverse Gaussian distributions.
- Approved by: OpenWorm Steering
- Implementation Status: Complete (cect v0.2.7 on PyPI; dataset policy ratified)
-
Next Actions:
-
Pin
cect==0.2.7inversions.lock - Add
data.connectomesection toopenworm.ymlschema - Update DD001 coupling table:
ConnectomeToolbox (external)→DD020 - Update DD008 coupling table: add DD020 as connectome data source
- Create CI test for
cectversion and default dataset load - Monitor OpenWormUnifiedReader stability for future default adoption