.. _ophys_metadata_structure: Ophys Metadata Structure ======================== This document describes the architecture of the optical physiology (ophys) metadata system in NeuroConv. It is intended for developers who are contributing new interfaces or modifying existing ones. For user-facing documentation on how to annotate ophys data, see :ref:`annotate_ophys_metadata`. Design Principles ----------------- The ophys metadata system is built on several core principles: 1. **Dictionary-Based Organization** Metadata is organized using dictionaries with meaningful keys. This structure makes metadata easier to reference, organize, and extend. Dictionaries allow direct access to specific components by name, which is clearer and less error-prone than positional access. .. code-block:: python metadata["Ophys"]["ImagingPlanes"]["visual_cortex"]["indicator"] = "GCaMP6s" 2. **Consistent metadata_key Across Interfaces** Every ophys interface uses a single ``metadata_key`` parameter that propagates to all its components (Device, ImagingPlane, PhotonSeries). This provides a consistent pattern across all interfaces, making the API predictable and easier to learn. 3. **Explicit References** Components reference each other using explicit ``_metadata_key`` fields. This makes relationships between components clear and enables validation. 4. **Top-Level Devices** Devices are stored at the top level (``metadata["Devices"]``) enabling device sharing across ophys, ecephys, and other modalities. 5. **Provenance-First get_metadata()** The ``get_metadata()`` method returns only values extracted from the source data, not defaults. Defaults are applied at NWB object creation time. Metadata Structure Overview --------------------------- The complete ophys metadata structure: .. code-block:: python metadata = { "NWBFile": {...}, # Session-level metadata "Subject": {...}, # Subject information "Devices": { "visual_cortex": { "name": "Microscope", "description": "Two-photon microscope for visual cortex imaging", "manufacturer": "Bruker" }, "hippocampus": { "name": "Miniscope", "description": "UCLA Miniscope v4 for hippocampal imaging" } }, "Ophys": { "ImagingPlanes": { "visual_cortex": { "name": "ImagingPlaneVisualCortex", "description": "Imaging plane in V1 layer 2/3", "device_metadata_key": "visual_cortex", # Reference to device "excitation_lambda": 920.0, "indicator": "GCaMP6s", "location": "V1 binocular zone", "optical_channel": [ { "name": "GreenChannel", "description": "GCaMP emission channel", "emission_lambda": 510.0 } ] }, "hippocampus": { "name": "ImagingPlaneHippocampus", "device_metadata_key": "hippocampus", "excitation_lambda": 470.0, "indicator": "GCaMP6f", "location": "CA1 pyramidal layer", "optical_channel": [...] } }, "MicroscopySeries": { "visual_cortex": { "name": "TwoPhotonSeriesVisualCortex", "description": "Two-photon calcium imaging", "imaging_plane_metadata_key": "visual_cortex", # Reference to imaging plane "unit": "n.a.", "dimension": [512, 512] }, "hippocampus": { "name": "OnePhotonSeriesHippocampus", "description": "Miniscope calcium imaging", "imaging_plane_metadata_key": "hippocampus", "unit": "n.a.", "dimension": [480, 752] } }, "PlaneSegmentations": { "suite2p_analysis": { "name": "PlaneSegmentation", "description": "ROIs detected by Suite2p", "imaging_plane_metadata_key": "visual_cortex" } }, "RoiResponses": { "suite2p_analysis": { "raw": { "name": "RoiResponseSeries", "description": "Raw fluorescence traces", "unit": "n.a." }, "neuropil": { "name": "Neuropil", "description": "Neuropil fluorescence", "unit": "n.a." }, "deconvolved": { "name": "Deconvolved", "description": "Deconvolved activity", "unit": "n.a." }, "denoised": { "name": "Denoised", "description": "Denoised activity", "unit": "n.a." }, "baseline": { "name": "Baseline", "description": "Baseline fluorescence", "unit": "n.a." }, "dff": { "name": "DfOverF", "description": "Delta F over F", "unit": "n.a." } } }, "SegmentationImages": { "name": "SegmentationImages", "description": "Summary images from segmentation", "suite2p_analysis": { "correlation": { "name": "correlation_image", "description": "Correlation image from Suite2p" }, "mean": { "name": "mean_image", "description": "Mean image from Suite2p" } } } } } The metadata_key Parameter -------------------------- All imaging and segmentation interfaces accept a ``metadata_key`` parameter. This parameter is **keyword-only** to ensure explicit usage. .. code-block:: python class SomeOphysInterface(BaseDataInterface): def __init__( self, *, # Force keyword-only verbose: bool = False, metadata_key: Optional[str] = None, **source_data, ): self.metadata_key = metadata_key ... The argument name ``metadata_key`` is the same across all interfaces ensuring a common API. When ``None`` (the default), the interface automatically generates a unique key from the parameters that make the interface unique (e.g. stream name, channel name). When the user passes an explicit value, they take responsibility for uniqueness and can use it to intentionally share or customize metadata keys. The default is ``None`` rather than a hardcoded string (e.g. ``"caiman_segmentation"``) for consistency across interfaces. Multi-stream and multi-channel interfaces (like ``ScanImageImagingInterface``) cannot have a fixed default because the key must include runtime parameters such as ``channel_name`` and ``plane_index``. Using ``None`` as the sentinel and resolving the default in ``__init__`` lets every interface share the same pattern: simple interfaces pick a static string, and parametric interfaces build the key from their constructor arguments. Key Propagation ~~~~~~~~~~~~~~~ The ``metadata_key`` parameter determines the entry point for the interface's **primary object(s)**. For imaging interfaces, this is the MicroscopySeries; for segmentation interfaces, this is the PlaneSegmentation and RoiResponses. Linked objects (ImagingPlane, Device) are resolved through their own ``_metadata_key`` references, which may point to different entries. For an imaging interface with ``metadata_key="visual_cortex"``: - ``metadata["Ophys"]["MicroscopySeries"]["visual_cortex"]`` - The primary object (direct lookup via ``metadata_key``) - ``metadata["Ophys"]["ImagingPlanes"][imaging_plane_metadata_key]`` - Resolved via ``imaging_plane_metadata_key`` inside the MicroscopySeries entry - ``metadata["Devices"][device_metadata_key]`` - Resolved via ``device_metadata_key`` inside the ImagingPlane entry For a segmentation interface with ``metadata_key="suite2p_analysis"``: - ``metadata["Ophys"]["PlaneSegmentations"]["suite2p_analysis"]`` - Direct lookup via ``metadata_key`` - ``metadata["Ophys"]["RoiResponses"]["suite2p_analysis"]`` - Direct lookup via the same ``metadata_key`` - ``metadata["Ophys"]["ImagingPlanes"][imaging_plane_metadata_key]`` - Resolved via ``imaging_plane_metadata_key`` inside the PlaneSegmentation entry - ``metadata["Devices"][device_metadata_key]`` - Resolved via ``device_metadata_key`` inside the ImagingPlane entry - ``metadata["Ophys"]["SegmentationImages"]["suite2p_analysis"]`` - Summary images In the simplest case, all these keys happen to be the same value (the interface's ``metadata_key``), which is what ``get_metadata()`` produces by default. But the indirection through ``_metadata_key`` fields allows different components to reference shared resources. For example, two segmentation pipelines can point their ``imaging_plane_metadata_key`` to the same ImagingPlane entry, and two imaging planes can point their ``device_metadata_key`` to the same Device entry. Single ImageSegmentation Container ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ While having multiple PlaneSegmentations makes sense (different segmentation algorithms like Suite2p vs CaImAn, or multiple runs of the same algorithm), there is no clear use case for multiple ImageSegmentation containers in an NWB file. PyNWB and the NWB schema allow multiple ImageSegmentation containers, but NeuroConv does not support this. Instead, NeuroConv uses a single, non-editable ImageSegmentation container where all PlaneSegmentations are stored. This is handled internally and users cannot configure the ImageSegmentation container. Users work directly with ``PlaneSegmentations`` in metadata, and NeuroConv places them in the single ImageSegmentation container when creating the NWB file. This simplifies both the metadata specification (no need to manage container names) and the organization of the resulting NWB file. Unified MicroscopySeries ~~~~~~~~~~~~~~~~~~~~~~~~ Metadata uses a unified ``MicroscopySeries`` key for all imaging data, regardless of whether it will be written as ``TwoPhotonSeries`` or ``OnePhotonSeries`` in the NWB file. The choice of NWB neurodata type (``TwoPhotonSeries`` vs ``OnePhotonSeries``) is specified as a **conversion option**, not in metadata. This follows the provenance principle: metadata describes the data, while conversion options determine how to write it to NWB. For format-specific interfaces (e.g., ScanImageImagingInterface), the series type is extracted from the source data. For generic interfaces (e.g., TiffImagingInterface), users must specify the series type at conversion time: .. code-block:: python # Format-specific interface - series type extracted from source interface = ScanImageImagingInterface(file_path="data.tif", metadata_key="visual_cortex") interface.add_to_nwbfile(nwbfile, metadata) # Uses extracted type (TwoPhotonSeries) # Generic interface - series type must be specified interface = TiffImagingInterface(file_path="data.tif", metadata_key="visual_cortex") interface.add_to_nwbfile(nwbfile, metadata, photon_series_type="TwoPhotonSeries") # Override is always possible interface.add_to_nwbfile(nwbfile, metadata, photon_series_type="OnePhotonSeries") Unified RoiResponses ~~~~~~~~~~~~~~~~~~~~ All ROI trace types (raw fluorescence, neuropil, deconvolved, denoised, baseline, df/f) are stored under a single ``RoiResponses`` key in metadata. This consolidates what NWB core splits into separate ``Fluorescence`` and ``DfOverF`` containers. At write time, all traces are written as ``RoiResponseSeries`` inside a single ``Fluorescence`` container, without splitting into ``Fluorescence`` and ``DfOverF``. This follows the direction of `nwb-schema#616 `_ and matches ndx-microscopy's single-container pattern (``MicroscopyResponseSeriesContainer``). Alignment with ndx-microscopy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The metadata structure is designed to align with the `ndx-microscopy `_ extension, which represents the future direction of optical physiology in NWB. ndx-microscopy uses: - ``MicroscopySeries`` for all imaging data (instead of separate ``TwoPhotonSeries``/``OnePhotonSeries``) - ``MicroscopyResponseSeries`` for all ROI traces (instead of separate ``Fluorescence``/``DfOverF``) By adopting similar patterns (``MicroscopySeries``, ``RoiResponses``), NeuroConv's metadata structure will require minimal changes when ndx-microscopy is integrated into NWB core. This makes the eventual transition smoother for users. Linking and Object Creation --------------------------- Each interface's goal is to create its **primary object(s)** in NWB. For example, an imaging interface creates a MicroscopySeries (e.g. TwoPhotonSeries, OnePhotonSeries). The metadata specifies attributes of that object (name, description, unit, etc.) but also its linked objects: an ImagingPlane for the series, and in turn a Device for the ImagingPlane. Contained vs Linked Components ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In NWB, some components are fully contained within their parent while others exist as separate, linked objects. This distinction affects how they are represented in metadata: **Contained components** like ``optical_channel`` are fully specified as nested metadata inside their parent. An ImagingPlane's optical channels are defined directly within the ImagingPlane metadata dictionary because they exist only within that ImagingPlane. **Linked components** like Device and ImagingPlane are separate NWB objects that can be shared or referenced by multiple other objects. For example, an ImagingPlane must reference the Device (microscope) that was used to acquire the data, and a TwoPhotonSeries must reference the ImagingPlane where the imaging occurred. How Linking Works ~~~~~~~~~~~~~~~~~ In the metadata dict, we don't have actual NWB objects yet, only dictionaries describing them. To express relationships between linked components, we use special ``_metadata_key`` fields that contain the key of the referenced component. ``device_metadata_key`` is used in ImagingPlane to reference its Device: .. code-block:: python imaging_plane = { "name": "ImagingPlane", "device_metadata_key": "visual_cortex", # Points to metadata["Devices"]["visual_cortex"] ... } ``imaging_plane_metadata_key`` is used in PhotonSeries and PlaneSegmentation to reference their ImagingPlane: .. code-block:: python photon_series = { "name": "TwoPhotonSeries", "imaging_plane_metadata_key": "visual_cortex", # Points to ImagingPlanes["visual_cortex"] ... } plane_segmentation = { "name": "PlaneSegmentation", "imaging_plane_metadata_key": "visual_cortex", ... } This allows multiple components (e.g., multiple segmentation pipelines) to reference the same ImagingPlane, as shown in the how-to guide for annotating multiple segmentations of the same data. When Objects Are Created ~~~~~~~~~~~~~~~~~~~~~~~~ Linked objects (Devices, ImagingPlanes, etc.) are not created when the metadata dict is assembled. They are created when ``add_to_nwbfile`` is called. The metadata dict defines what *could* be created, and the ``_metadata_key`` references determine what actually gets written to the NWB file. At that point, the string references are resolved to actual NWB objects. The rules are: 1. Only entries that are actually referenced by other objects (via ``_metadata_key`` fields) are created. Entries that exist in the metadata dict but are not referenced by anything will not be written to the file. This means you can define all the devices of a conversion in a shared YAML and only the ones that are actually linked will end up in the NWB file. 2. If a required link is missing (e.g. an ImagingPlane has no ``device_metadata_key``) and the object requires a linked object (e.g. an ImagingPlane requires a Device), a default object will be created and linked automatically at writing time. 3. For shared resources (e.g. two imaging planes using the same microscope), the user or the converter sets the ``_metadata_key`` references explicitly. The object is created by whichever interface writes first, and subsequent interfaces reuse the existing object. .. code-block:: python # Two imaging planes sharing one device metadata["Devices"]["shared_microscope"] = { "name": "Microscope", "description": "Two-photon microscope used for both planes", "manufacturer": "Thorlabs", } metadata["Ophys"]["ImagingPlanes"]["plane_area1"] = { "name": "ImagingPlaneArea1", "location": "V1", "device_metadata_key": "shared_microscope", } metadata["Ophys"]["ImagingPlanes"]["plane_area2"] = { "name": "ImagingPlaneArea2", "location": "V2", "device_metadata_key": "shared_microscope", } Note that device keys (and imaging plane keys) are independent of any interface's ``metadata_key``. They can be any arbitrary string, as shown by ``"shared_microscope"`` above, which does not correspond to any interface's key. No interface "owns" the device; it is created at write time by whichever interface first follows the reference chain to it. Because only referenced entries are written to the NWB file, the metadata dict can hold all possible components (e.g. in a shared YAML) and the ``_metadata_key`` links control which ones are actually used for each conversion. This enables a workflow where a single YAML file contains the full metadata for a project (all devices, imaging planes, etc.) and is shared across sessions in a multi-session conversion script. For each session, the conversion code sets the ``_metadata_key`` references programmatically to select which components to write and how to link them. For example, different sessions might link their imaging planes to different devices, or different segmentation runs might reference different imaging planes, all from the same shared metadata file.