Migrate from 0.3.0 to 0.4.0

rly · Nov 12, 2024 · 2b2ae62 · 2b2ae62
1 parent 092a6ad
commit 2b2ae62
Show file tree

Hide file tree

Showing 15 changed files with 652 additions and 827 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,6 +4,9 @@
 # generated docs
 docs/source/_format_auto_docs
 
+# developer-specific files
+.vscode/
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,3 @@
 # Changelog for ndx-events
 
-## 0.3.0 (Upcoming)
-
-
+## 0.4.0 (Upcoming)
diff --git a/README.md b/README.md
@@ -1,58 +1,63 @@
 # ndx-events Extension for NWB
 
-This is an NWB extension for storing timestamped event data and TTL pulses.
-
-The latest version is 0.3.0. This is a major change from previous versions.
-
-**`EventTypesTable`**: Event types (e.g., lick, reward left, reward right, airpuff, reach) and metadata about them should be stored in an `EventTypesTable` object.
-- `EventTypesTable` inherits from `DynamicTable` and stores metadata related to each event type, one per row. 
-- An "event_name" text column is required.
-- A "event_type_description" text column is required.
-- The table allows for an arbitrary number of custom columns to be added for additional metadata for each event type. 
-- This table is intended to live in a `Task` object at the path "general/task" in the `NWBFile`.
-
-**`EventsTable`**: Event times and metadata about them should be stored in an `EventsTable` object.
-- `EventsTable` inherits from `DynamicTable` and stores metadata related to each event time / instance, one per row.
-- A "timestamp" column of type `TimestampVectorData` is required.
-- A “duration” column of type `DurationVectorData` is optional. 
-- An “event_type” column that is a foreign key reference to a row index of the `EventTypesTable` is required.
-- A "value" text column is optional. This enables storage of another layer of events within an event type. This could store different reward sizes or different tone frequencies or other parameterizations of an event. For example, if you have three levels of reward (e.g., 1 drop, 2 drops, 3 drops), instead of encoding each level of reward as its own event type (e.g., "reward_value_1", "reward_value_2", "reward_value_3", you could encode "reward" as the event type, and the value for each event time could be "1", "2", or "3". 
-- Because this inherits from `DynamicTable`, users can add additional custom columns to store other metadata.
-- This table is intended to live either under the "acquisition" group or in a "behavior" `ProcessingModule`, i.e., under the "processing/behavior" group.
-
-**`TtlTypesTable`**: TTL pulse types and metadata about them should be stored in a `TtlTypesTable` object. 
-- `TtlTypesTable` inherits from `EventTypesTable` and stores metadata related to each TTL pulse type, one per row. 
-- A "pulse_value" unsigned integer column is required. 
-- This table is intended to live in a `Task` object at the path "general/task" in the `NWBFile`.
-
-**`TtlsTable`**: TTL pulses and metadata about them should be stored in a `TtlsTable` object.
-- `TtlsTable` inherits from `EventsTable`.
-- The "event_type" column inherited from `EventsTable` should refer to the `TtlTypesTable`.
-- This table is intended to live either under the "acquisition" group or in a "behavior" `ProcessingModule`, i.e., under the "processing/behavior" group.
-
-This extension defines a few additional neurodata types related to storing events:
-
-**`Task`**: `Task` type is a subtype of the `LabMetaData` type and holds the `EventTypesTable` and `TtlTypesTable`. This allows the `Task` type to be added as a group in the root "general" group. 
+This is an NWB extension for storing timestamped event data.
+
+The latest version is 0.4.0. This is a major change from previous versions.
+
+1. A `TimestampVectorData` type that extends `VectorData` and stores a 1D array of timestamps (float32) in seconds
+   - Values are in seconds from session start time (like all other timestamps in NWB)
+   - It has a scalar string attribute named "unit". The value of the attribute is fixed to "seconds".
+   - It has an optional scalar float attribute named "resolution" that represents the smallest possible difference between two timestamps. This is usually 1 divided by the sampling rate for timestamps of the data acquisition system. (Alternatively, the event sampling rate could be stored.)
+   - This type can be used to represent a column of timestamps in any `DynamicTable`, such as the NWB `Units` table and the new `EventsTable` described below.
+2. A `DurationVectorData` type that extends `VectorData` and stores a 1D array of durations (float32) in seconds. It is otherwise identical to the `TimestampVectorData` type.
+   - If this is used in a table where some events have a duration and some do not (or it is not known yet), then a value of NaN can be used for events without a duration or with a duration that is not yet specified. If the latter, the mapping should be documented in the description of the `DurationVectorData`.
+3. A `CategoricalVectorData` type that extends `VectorData` and stores the mappings of data values (of any type) to meanings. This is an experimental type to evaluate one possible way of storing the meanings (longer descriptions) associated with different categorical values stored in a table column. This can be used to add categorical metadata values to an `EventsTable`.  This type will be marked as experimental while the NWB team evaluates possible alternate solutions to annotating the values of a dataset, such as LinkML-based term sets, non-table based approaches, and external mapping files.
+   - The type contains an object reference to a `MeaningsTable` named "meanings". See below. Unfortunately, because `CategoricalVectorData` is a dataset, it cannot contain a `MeaningsTable` within it, so the `MeaningsTable` is placed in the parent `EventsTable` and referenced by the `CategoricalVectorData`.
+   - It may also contain an optional 1D attribute named "filter_values" to define missing and invalid values within a data field to be filtered out during analysis, e.g., the dataset may contain one or more of: "undefined" or "None" to signal that those values in the `CategoricalVectorData` dataset are missing or invalid. Due to constraints of NWB/HDMF attributes, attributes must have a dtype, so currently, only string values (not -1 or NaN) are allowed.
+   - This type is similar to an `EnumData`, which is a `VectorData` of an enumerated type, except that the values stored in the column are strings that are short-hand representations of the concept, as opposed to integers. Storing strings is slightly less efficient than storing integers, but for these use cases, these tables will rarely be large and storing strings directly is more intuitive and accessible to users.
+4. A `MeaningsTable` type that extends `DynamicTable` with two required columns:
+   - A "value" column that contains all the possible values that could be stored in the parent `CategoricalVectorData` object. For example, if the `CategoricalVectorData` stores the port in which the subject performed a nose poke, the possible values might be "left", "center", and "right". All possible values must be listed, even if not all values are observed, e.g., if the subject does not poke in the "center" port, "center" should still be listed to signal that it was a possible option.
+   - A "meaning" column with string dtype that contains a longer description of the concept. For example, for the value "left", the meaning might be "The subject performed a nosepoke in the left-most port, from the viewpoint of the subject. This is signaled by detection of the port’s infrared beam being broken."
+   - Users can add custom, user-defined columns to provide additional information about the possible values, such as [HED (Hierarchical Event Descriptor)](https://www.hed-resources.org/en/latest/) tags. For HED tags, users may consider using the `HedTags` type, a subtype of `VectorData`, in the [ndx-hed extension](https://github.com/hed-standard/ndx-hed).
+   - As described in `CategoricalVectorData`, this arrangement will be marked as experimental.
+5. An `EventsTable` type for storing a collection of event times that have the same parameterizations/properties/metadata (i.e., they are the same type of event, such as licks, image presentations, or reward deliveries)
+   - It inherits from `DynamicTable` and stores metadata related to each event time / instance, one per row.
+   - It has a "timestamp" column of type `TimestampVectorData` is required.
+   - It has a "duration" column of type `DurationVectorData` is optional.
+   - Because this inherits from `DynamicTable`, users can add additional custom columns to store other metadata, such as parameterizations of an event, e.g., reward value in uL, image category, or tone frequency.
+   - The "description" of this table should include information about how the event times were computed, especially if the times are the result of processing or filtering raw data. For example, if the experimenter is encoding different types of events using a "strobed" or "N-bit" encoding, then the "description" value should describe which channels were used and how the event time is computed, e.g., as the rise time of the first bit.
+   - It contains a collection of `MeaningsTable` objects referenced by any `CategoricalVectorData` columns. These columns are placed in a subgroup of the EventsTable named "meanings". Alternatively, these `MeaningsTable` objects could be placed under the root `NWBFile`, but it is probably more useful to keep them close to the objects that they describe. As described in `CategoricalVectorData`, this arrangement will be marked as experimental.
+
+The PyNWB and MatNWB APIs would provide functions to create these tables. For example, in PyNWB:
+
+```python
+stimulus_presentation_events = EventsTable(name="stimulus_presentation_events")
+stimulus_presentation_events.add_column("stimulus_type", col_cls=CategoricalVectorData)
+stimulus_presentation_events.add_row(timestamp=1.0, stimulus_type="circle")
+stimulus_presentation_events.add_row(timestamp=4.5, stimulus_type="square")
+nwbfile.add_events_table(stimulus_presentation_events)
+```
 
-**`TimestampVectorData`**: The `TimestampVectorData` type stores a 1D array of timestamps in seconds.
-- Values are in seconds from session start time.
-- It has a "unit" attribute. The value of the attribute is fixed to "seconds".
-- It has a "resolution" attribute that represents the smallest possible difference between two timestamps. Usually 1 divided by the sampling rate for timestamps of the data acquisition system.
+The APIs would also provide the following interfaces:
+- `nwbfile.events_tables` returns a dictionary of `EventsTable` objects, similar to `nwbfile.acquisition`
+- Use `nwbfile.events_tables["stimulus_presentation_events"]` to access an `EventsTable` by name
+- `nwbfile.merge_events_tables(tables: list[EventsTable])`, which merges a selection of `EventsTable` objects into a read-only table, sorted by timestamp
+- `nwbfile.get_all_events()`, which merges all the `EventsTable` objects into one read-only table, sorted by timestamp
 
-**`DurationVectorData`**: The `DurationVectorData` type that stores a 1D array of durations in seconds.
-- It is otherwise identical to the `TimestampVectorData` type.
+This extension was developed by Ryan Ly, Oliver Rübel, the NWB Technical Advisory Board, and the NWBEP001 Review Working Group.
 
-This extension was developed by Ryan Ly, Oliver Rübel, and the NWB Technical Advisory Board.
 Information about the rationale, background, and alternative approaches to this extension can be found here:
 https://docs.google.com/document/d/1qcsjyFVX9oI_746RdMoDdmQPu940s0YtDjb1en1Xtdw
 
 ## Installation
 
-The latest **ndx-events 0.3.0** has not yet been released on PyPI. To install it on Python, use:
+The latest **ndx-events 0.4.0** has not yet been released on PyPI. To install it on Python, use:
 ```bash
 pip install git+https://github.com/rly/ndx-events.git
 ```
 
+ndx-events 0.3.0 was not released on PyPI.
+
 To install the 0.2.0 version, use:
 Python:
 ```bash
@@ -64,7 +69,13 @@ Matlab:
 generateExtension('<directory path>/ndx-events/spec/ndx-events.namespace.yaml');
 ```
 
+## Usage examples
+
+1. [Example writing TTL pulses and stimulus presentations to an NWB file](examples/write_ttls_events.py).
+
+
 ## Developer installation
+
 In a Python 3.8-3.12 environment:
 ```bash
 pip install -r requirements-dev.txt

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -10,7 +10,7 @@
 copyright = "2024, Ryan Ly"
 author = "Ryan Ly"
 
-version = "0.3.0"
+version = "0.4.0"
 release = "alpha"
 
 # -- General configuration ---------------------------------------------------

diff --git a/examples/write_ttls_events.py b/examples/write_ttls_events.py
@@ -0,0 +1,178 @@
+"""
+Example script that demonstrates how to write an EventsTable with a CategoricalVectorData and associated MeaningsTable
+to store raw TTL pulses received by the acquisition system and processed stimulus presentation events.
+"""
+
+from datetime import datetime
+from pynwb import NWBHDF5IO
+
+from ndx_events import (
+    EventsTable,
+    CategoricalVectorData,
+    MeaningsTable,
+    NdxEventsNWBFile,
+)
+
+nwbfile = NdxEventsNWBFile(
+    session_description="session description",
+    identifier="cool_experiment_001",
+    session_start_time=datetime.now().astimezone(),
+)
+
+# In this experiment, TTL pulses were sent by the stimulus computer
+# to signal important time markers during the experiment/trial,
+# when the stimulus was placed on the screen and removed from the screen,
+# when the question appeared, and the responses of the subject.
+
+# ref: https://www.nature.com/articles/s41597-020-0415-9, DANDI:000004
+
+# We will first create an EventsTable to store the raw TTL pulses received by the acquisition system.
+# Storing the raw TTL pulses is not necessary, but it can be useful for debugging and understanding the experiment.
+# The data curator could
+# Before doing so, we will create a CategoricalVectorData column for the possible integer values for the TTL pulse
+# and associate it with a MeaningsTable that describes the meaning of each value.
+
+pulse_value_meanings_table = MeaningsTable(
+    name="pulse_value_meanings", description="The meanings of each integer value for a TTL pulse."
+)
+pulse_value_meanings_table.add_row(value=55, meaning="Start of experiment")
+pulse_value_meanings_table.add_row(value=1, meaning="Stimulus onset")
+pulse_value_meanings_table.add_row(value=2, meaning="Stimulus offset")
+pulse_value_meanings_table.add_row(value=3, meaning="Question screen onset")
+
+yes_animal_response_description = (
+    "During the learning phase, subjects are instructed to respond to the following "
+    "question: 'Is this an animal?' in each trial. The response is 'Yes, this is an animal'."
+)
+no_animal_response_description = (
+    "During the learning phase, subjects are instructed to respond to the following "
+    "question: 'Is this an animal?' in each trial. The response is 'No, this is not an animal'."
+)
+pulse_value_meanings_table.add_row(value=20, meaning=yes_animal_response_description)
+pulse_value_meanings_table.add_row(value=21, meaning=no_animal_response_description)
+
+new_confident_response_description = (
+    "During the recognition phase, subjects are instructed to respond to the following "
+    "question: 'Have you seen this image before?' in each trial. The response is 'New, confident'."
+)
+new_probably_response_description = (
+    "During the recognition phase, subjects are instructed to respond to the following "
+    "question: 'Have you seen this image before?' in each trial. The response is 'New, probably'."
+)
+new_guess_response_description = (
+    "During the recognition phase, subjects are instructed to respond to the following "
+    "question: 'Have you seen this image before?' in each trial. The response is 'New, guess'."
+)
+old_guess_response_description = (
+    "During the recognition phase, subjects are instructed to respond to the following "
+    "question: 'Have you seen this image before?' in each trial. The response is 'Old, guess'."
+)
+old_probably_response_description = (
+    "During the recognition phase, subjects are instructed to respond to the following "
+    "question: 'Have you seen this image before?' in each trial. The response is 'Old, probably'."
+)
+old_confident_response_description = (
+    "During the recognition phase, subjects are instructed to respond to the following "
+    "question: 'Have you seen this image before?' in each trial. The response is 'Old, confident'."
+)
+
+pulse_value_meanings_table.add_row(value=31, meaning=new_confident_response_description)
+pulse_value_meanings_table.add_row(value=32, meaning=new_probably_response_description)
+pulse_value_meanings_table.add_row(value=33, meaning=new_guess_response_description)
+pulse_value_meanings_table.add_row(value=34, meaning=old_guess_response_description)
+pulse_value_meanings_table.add_row(value=35, meaning=old_probably_response_description)
+pulse_value_meanings_table.add_row(value=36, meaning=old_confident_response_description)
+
+pulse_value_meanings_table.add_row(value=6, meaning="End of trial")
+pulse_value_meanings_table.add_row(value=66, meaning="End of experiment")
+
+pulse_value_column = CategoricalVectorData(
+    name="pulse_value", description="Integer value of the TTL pulse", meanings=pulse_value_meanings_table
+)
+
+ttl_events_table = EventsTable(
+    name="ttl_events",
+    description="TTL events",
+    columns=[pulse_value_column],
+    meanings_tables=[pulse_value_meanings_table],
+)
+ttl_events_table.add_row(
+    timestamp=6820.092244,
+    pulse_value=55,
+)
+ttl_events_table.add_row(
+    timestamp=6821.208244,
+    pulse_value=1,
+)
+ttl_events_table.add_row(
+    timestamp=6822.210644,
+    pulse_value=2,
+)
+ttl_events_table.add_row(
+    timestamp=6822.711364,
+    pulse_value=3,
+)
+ttl_events_table.add_row(
+    timestamp=6825.934244,
+    pulse_value=31,
+)
+ttl_events_table.timestamp.resolution = 1 / 50000.0  # specify the resolution of the timestamps (optional)
+
+# The data curator may want to create an EventsTable to store more processed information than the TTLs table
+# e.g., converting stimulus onset and offset into a single stimulus event with metadata.
+# This may be redundant with information in the trials table if the task is
+# structured into trials.
+
+stimulus_category_meanings_table = MeaningsTable(
+    name="stimulus_category_meanings", description="The meanings of each stimulus category"
+)
+stimulus_category_meanings_table.add_row(value="smallAnimal", meaning="An image of a small animal was presented.")
+stimulus_category_meanings_table.add_row(value="largeAnimal", meaning="An image of a large animal was presented.")
+
+stimulus_category_column = CategoricalVectorData(
+    name="stimulus_category", description="The category of the stimulus", meanings=stimulus_category_meanings_table
+)
+
+stimulus_presentation_table = EventsTable(
+    name="stimulus_presentations",
+    description="Metadata about stimulus presentations",
+    columns=[stimulus_category_column],
+    meanings_tables=[stimulus_category_meanings_table],
+)
+stimulus_presentation_table.add_column(
+    name="stimulus_image_index", description="Frame index of the stimulus image in the StimulusPresentation object"
+)  # this is an integer.
+# One could make this a CategoricalVectorData column if there are a limited number of stimulus images and one
+# wants to describe each one
+
+stimulus_presentation_table.add_row(
+    timestamp=6821.208244,
+    duration=1.0024,  # this comes from the stimulus onset and offset TTLs
+    stimulus_category="smallAnimal",
+    stimulus_image_index=0,
+)
+stimulus_presentation_table.add_row(
+    timestamp=6825.208244,
+    duration=0.99484,
+    stimulus_category="phones",
+    stimulus_image_index=1,
+)
+stimulus_presentation_table.timestamp.resolution = 1 / 50000.0  # specify the resolution of the timestamps (optional)
+stimulus_presentation_table.duration.resolution = 1 / 50000.0  # specify the resolution of the durations (optional)
+
+nwbfile.add_events_table(ttl_events_table)
+nwbfile.add_events_table(stimulus_presentation_table)
+
+print(nwbfile.get_all_events())
+
+# Write NWB file.
+filename = "test_events.nwb"
+with NWBHDF5IO(filename, "w") as io:
+    io.write(nwbfile)
+
+# Read NWB file and check its contents.
+with NWBHDF5IO(filename, "r", load_namespaces=True) as io:
+    read_nwbfile = io.read()
+    print(read_nwbfile)
+    print(read_nwbfile.events["ttl_events"].to_dataframe())
+    print(read_nwbfile.events["stimulus_presentations"].to_dataframe())