Merge branch 'dev' into faster-imports

NeurodataWithoutBorders · Sep 17, 2024 · ac59dda · ac59dda
2 parents 070cb66 + b9f9e5a
commit ac59dda
Show file tree

Hide file tree

Showing 19 changed files with 326 additions and 90 deletions.
diff --git a/.github/workflows/run_tests.yml b/.github/workflows/run_tests.yml
@@ -64,7 +64,7 @@ jobs:
 
       - name: Upload distribution as a workspace artifact
         if: ${{ matrix.upload-wheels }}
-        uses: actions/upload-artifact@v3
+        uses: actions/upload-artifact@v4
         with:
           name: distributions
           path: dist
@@ -283,7 +283,7 @@ jobs:
           python-version: '3.12'
 
       - name: Download wheel and source distributions from artifact
-        uses: actions/download-artifact@v3
+        uses: actions/download-artifact@v4
         with:
           name: distributions
           path: dist

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,9 +1,19 @@
 # PyNWB Changelog
 
-## PyNWB 2.8.2 (Upcoming)
+## PyNWB 2.8.2 (September 9, 2024)
+
+### Enhancements and minor changes
+- Added support for numpy 2.0. @mavaylon1 [#1956](https://github.com/NeurodataWithoutBorders/pynwb/pull/1956)
+- Make `get_cached_namespaces_to_validate` a public function @stephprince [#1961](https://github.com/NeurodataWithoutBorders/pynwb/pull/1961)
 
 ### Documentation and tutorial enhancements
 - Added pre-release pull request instructions to release process documentation @stephprince [#1928](https://github.com/NeurodataWithoutBorders/pynwb/pull/1928)
+- Added section on how to use the `family` driver in `h5py` for splitting data across multiple files @oruebel [#1949](https://github.com/NeurodataWithoutBorders/pynwb/pull/1949)
+
+### Bug fixes
+- Fixed `can_read` method to return False if no nwbfile version can be found @stephprince [#1934](https://github.com/NeurodataWithoutBorders/pynwb/pull/1934)
+- Changed `epoch_tags` to be a NWBFile property instead of constructor argument. @stephprince [#1935](https://github.com/NeurodataWithoutBorders/pynwb/pull/1935)
+- Exposed option to not cache the spec in `NWBHDF5IO.export`. @rly [#1959](https://github.com/NeurodataWithoutBorders/pynwb/pull/1959)
 
 ### Performance
 - Cache global type map to speed import 3x. [#1931](https://github.com/NeurodataWithoutBorders/pynwb/pull/1931)

diff --git a/docs/gallery/advanced_io/plot_iterative_write.py b/docs/gallery/advanced_io/plot_iterative_write.py
@@ -1,4 +1,6 @@
 """
+.. _iterative_write:
+
 Iterative Data Write
 ====================
 

diff --git a/docs/gallery/advanced_io/linking_data.py → .../gallery/advanced_io/plot_linking_data.py b/docs/gallery/advanced_io/linking_data.py → .../gallery/advanced_io/plot_linking_data.py
@@ -13,7 +13,7 @@
 HDF5 files with NWB data files via external links. To make things more concrete, let's look at the following use
 case. We want to simultaneously record multiple data streams during data acquisition. Using the concept of external
 links allows us to save each data stream to an external HDF5 files during data acquisition and to
-afterwards link the data into a single NWB file. In this case, each recording becomes represented by a
+afterward link the data into a single NWB file. In this case, each recording becomes represented by a
 separate file-system object that can be set as read-only once the experiment is done.  In the following
 we are using :py:meth:`~pynwb.base.TimeSeries` as an example, but the same approach works for other
 NWBContainers as well.
@@ -42,7 +42,7 @@
 
 
 Creating test data
----------------------------
+^^^^^^^^^^^^^^^^^^
 
 In the following we are creating two :py:meth:`~pynwb.base.TimeSeries` each written to a separate file.
 We then show how we can integrate these files into a single NWBFile.
@@ -61,7 +61,7 @@
 # Create the base data
 start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal())
 data = np.arange(1000).reshape((100, 10))
-timestamps = np.arange(100)
+timestamps = np.arange(100, dtype=float)
 filename1 = "external1_example.nwb"
 filename2 = "external2_example.nwb"
 filename3 = "external_linkcontainer_example.nwb"
@@ -105,12 +105,12 @@
 
 #####################
 # Linking to select datasets
-# --------------------------
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^
 #
 
 ####################
 # Step 1: Create the new NWBFile
-# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 # Create the first file
 nwbfile4 = NWBFile(
@@ -122,7 +122,7 @@
 
 ####################
 # Step 2: Get the dataset you want to link to
-# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 # Now let's open our test files and retrieve our timeseries.
 #
 
@@ -134,7 +134,7 @@
 
 ####################
 # Step 3: Create the object you want to link to the data
-# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 #
 # To link to the dataset we can simply assign the data object (here `` timeseries_1.data``) to a new ``TimeSeries``
 
@@ -167,7 +167,7 @@
 
 ####################
 # Step 4: Write the data
-# ^^^^^^^^^^^^^^^^^^^^^^^
+# ~~~~~~~~~~~~~~~~~~~~~~~~
 #
 with NWBHDF5IO(filename4, "w") as io4:
     # Use link_data=True to specify default behavior to link rather than copy data
@@ -185,7 +185,7 @@
 
 ####################
 # Linking to whole Containers
-# ---------------------------
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 #
 # Appending to files and linking is made possible by passing around the same
 # :py:class:`~hdmf.build.manager.BuildManager`. You can get a manager to pass around
@@ -203,7 +203,7 @@
 
 ####################
 # Step 1: Get the container object you want to link to
-# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 # Now let's open our test files and retrieve our timeseries.
 #
 
@@ -219,7 +219,7 @@
 
 ####################
 # Step 2: Add the container to another NWBFile
-# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 # To integrate both :py:meth:`~pynwb.base.TimeSeries` into a single file we simply create a new
 # :py:meth:`~pynwb.file.NWBFile` and add our existing :py:meth:`~pynwb.base.TimeSeries` to it. PyNWB's
 # :py:class:`~pynwb.NWBHDF5IO` backend then automatically detects that the TimeSeries have already
@@ -247,7 +247,7 @@
 # ------------------------------
 #
 # Using the :py:func:`~pynwb.file.NWBFile.copy` method allows us to easily create a shallow copy
-# of a whole NWB:N file with links to all data in the original file. For example, we may want to
+# of a whole NWB file with links to all data in the original file. For example, we may want to
 # store processed data in a new file separate from the raw data, while still being able to access
 # the raw data. See the :ref:`scratch` tutorial for a detailed example.
 #
@@ -259,5 +259,128 @@
 # External links are convenient but to share data we may want to hand a single file with all the
 # data to our collaborator rather than having to collect all relevant files. To do this,
 # :py:class:`~hdmf.backends.hdf5.h5tools.HDF5IO` (and in turn :py:class:`~pynwb.NWBHDF5IO`)
-# provide the convenience function :py:meth:`~hdmf.backends.hdf5.h5tools.HDF5IO.copy_file`,
-# which copies an HDF5 file and resolves all external links.
+# provide the convenience function :py:meth:`~hdmf.backends.hdf5.h5tools.HDF5IO.export`,
+# which can copy the file and resolves all external links.
+
+
+####################
+# Automatically splitting large data across multiple HDF5 files
+# -------------------------------------------------------------------
+#
+# For extremely large datasets it can be useful to split data across multiple files, e.g., in cases where
+# the file stystem does not allow for large files. While we can achieve this by writing different
+# components (e.g., :py:meth:`~pynwb.base.TimeSeries`)  to different files as described above,
+# this option does not allow splitting data from single datasets. An alternative option is to use the
+# ``family`` driver in ``h5py`` to automatically split the NWB file into a collection of many HDF5 files.
+# The ``family`` driver stores the file on disk as a series of fixed-length chunks (each in its own file).
+# In practice, to write very large arrays, we can combine this approach with :ref:`iterative_write` to
+# avoid having to load all data into memory. In the example shown here we use a manual approach to
+# iterative write by using :py:class:`~hdmf.backends.hdf5.h5_utils.H5DataIO` to create an empty dataset and
+# then filling in the data afterward.
+
+####################
+# Step 1: Create the NWBFile as usual
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+from pynwb import NWBFile
+from pynwb.base import TimeSeries
+from datetime import datetime
+from hdmf.backends.hdf5 import H5DataIO
+import numpy as np
+
+# Create an NWBFile object
+nwbfile = NWBFile(session_description='example file family',
+                  identifier=str(uuid4()),
+                  session_start_time=datetime.now().astimezone())
+
+# Create the data as an empty dataset so that we can write to it later
+data = H5DataIO(maxshape=(None, 10),  # make the first dimension expandable
+                dtype=np.float32,     # create the data as float32
+                shape=(0, 10),        # initial data shape to initialize as empty dataset
+                chunks=(1000, 10)
+                )
+
+# Create a TimeSeries object
+time_series = TimeSeries(name='example_timeseries',
+                         data=data,
+                         starting_time=0.0,
+                         rate=1.0,
+                         unit='mV')
+
+# Add the TimeSeries to the NWBFile
+nwbfile.add_acquisition(time_series)
+
+####################
+# Step 2: Open the new file with the `family` driver and write
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# Here we need to open the file with `h5py` first to set up the driver, and then we can use
+# that file with :py:class:`pynwb.NWBHDF5IO`. This is required, because :py:class:`pynwb.NWBHDF5IO`
+# currently does not support passing the `memb_size` option required by the `family` driver.
+
+import h5py
+from pynwb import  NWBHDF5IO
+
+# Define the size of the individual files, determining the number of files to create
+# chunk_size = 1 * 1024**3  # 1GB per file
+chunk_size = 1024**2  # 1MB just for testing
+
+# filename pattern
+filename_pattern = 'family_nwb_file_%d.nwb'
+
+# Create the HDF5 file using the family driver
+with h5py.File(name=filename_pattern, mode='w', driver='family', memb_size=chunk_size) as f:
+
+    # Use NWBHDF5IO to write the NWBFile to the HDF5 file
+    with NWBHDF5IO(file=f, mode='w') as io:
+        io.write(nwbfile)
+
+        # Write new data iteratively to the file
+        for i in range(10):
+            start_index = i * 1000
+            stop_index = start_index + 1000
+            data.dataset.resize((stop_index, 10))          # Resize the dataset
+            data.dataset[start_index: stop_index , :] = i  # Set the additional values
+
+####################
+# .. note::
+#
+#    Alternatively, we could have also used the :ref:`iterative_write` features to write the data
+#    iteratively directly as part of the `io.write` call instead of manually afterward.
+
+####################
+# Step 3: Read a file written with the family driver
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#
+
+
+# Open the HDF5 file using the family driver
+with h5py.File(name=filename_pattern, mode='r', driver='family', memb_size=chunk_size) as f:
+    # Use NWBHDF5IO to read the NWBFile from the HDF5 file
+    with NWBHDF5IO(file=f, manager=None, mode='r') as io:
+        nwbfile = io.read()
+        print(nwbfile)
+
+
+####################
+# .. note::
+#
+#    The filename you provide when using the ``family`` driver must contain a printf-style integer format code
+#    (e.g.`%d`), which will be replaced by the file sequence number.
+#
+# .. note::
+#
+#    The ``memb_size`` parameter must be set on both write and read. As such, reading the file requires
+#    the user to know the ``memb_size`` that was used for writing.
+#
+# .. warning::
+#
+#    The DANDI archive may not support NWB files that are split in this fashion.
+#
+# .. note::
+#
+#    Other file drivers, e.g., ``split`` or ``multi`` could be used in a similar fashion.
+#    However, not all HDF5 drivers are supported by the the high-level API of
+#    ``h5py`` and as such may require a bit more complex setup via the the
+#    low-level HDF5 API in ``h5py``.
+#
+
diff --git a/docs/gallery/domain/images.py b/docs/gallery/domain/images.py
@@ -11,31 +11,31 @@
 about the subject, the environment, the presented stimuli, or other parts
 related to the experiment. This tutorial focuses in particular on the usage of:
 
-* :py:class:`~pynwb.image.OpticalSeries` for series of images that were presented as stimulus
+* :py:class:`~pynwb.image.OpticalSeries` and :py:class:`~pynwb.misc.AbstractFeatureSeries` for series of images that
+  were presented as stimulus
 * :py:class:`~pynwb.image.ImageSeries`, for series of images (movie segments);
 * :py:class:`~pynwb.image.GrayscaleImage`, :py:class:`~pynwb.image.RGBImage`,
   :py:class:`~pynwb.image.RGBAImage`, for static images;
 
 The following examples will reference variables that may not be defined within the block they are used in. For
 clarity, we define them here:
 """
-# Define file paths used in the tutorial
-
-import os
 
 # sphinx_gallery_thumbnail_path = 'figures/gallery_thumbnails_image_data.png'
 from datetime import datetime
+import os
 from uuid import uuid4
 
 import numpy as np
 from dateutil import tz
-from dateutil.tz import tzlocal
 from PIL import Image
 
 from pynwb import NWBHDF5IO, NWBFile
 from pynwb.base import Images
 from pynwb.image import GrayscaleImage, ImageSeries, OpticalSeries, RGBAImage, RGBImage
+from pynwb.misc import AbstractFeatureSeries
 
+# Define file paths used in the tutorial
 nwbfile_path = os.path.abspath("images_tutorial.nwb")
 moviefiles_path = [
     os.path.abspath("image/file_1.tiff"),
@@ -55,7 +55,7 @@
 nwbfile = NWBFile(
     session_description="my first synthetic recording",
     identifier=str(uuid4()),
-    session_start_time=datetime.now(tzlocal()),
+    session_start_time=session_start_time,
     experimenter=[
         "Baggins, Bilbo",
     ],
@@ -109,6 +109,35 @@
 nwbfile.add_stimulus(stimulus=optical_series)
 
 ####################
+# AbstractFeatureSeries: Storing features of visual stimuli
+# ---------------------------------------------------------
+#
+# While it is usually recommended to store the entire image data as an :py:class:`~pynwb.image.OpticalSeries`, sometimes
+# it is useful to store features of the visual stimuli instead of or in addition to the raw image data. For example,
+# you may want to store the mean luminance of the image, the contrast, or the spatial frequency. This can be done using
+# an instance of :py:class:`~pynwb.misc.AbstractFeatureSeries`. This class is a general container for storing time
+# series of features that are derived from the raw image data.
+
+# Create some fake feature data
+feature_data = np.random.rand(200, 3)  # 200 time points, 3 features
+
+# Create an AbstractFeatureSeries object
+abstract_feature_series = AbstractFeatureSeries(
+    name="StimulusFeatures",
+    data=feature_data,
+    timestamps=np.linspace(0, 1, 200),
+    description="Features of the visual stimuli",
+    features=["luminance", "contrast", "spatial frequency"],
+    feature_units=["n.a.", "n.a.", "cycles/degree"],
+)
+
+# Add the AbstractFeatureSeries to the NWBFile
+nwbfile.add_stimulus(abstract_feature_series)
+
+####################
+# Like all :py:class:`~pynwb.base.TimeSeries`, :py:class:`~pynwb.misc.AbstractFeatureSeries` specify timing using
+# either the ``rate`` and ``starting_time`` attributes or the ``timestamps`` attribute.
+#
 # ImageSeries: Storing series of images as acquisition
 # ----------------------------------------------------
 #
@@ -118,7 +147,6 @@
 #
 # We can add raw data to the :py:class:`~pynwb.file.NWBFile` object as *acquisition* using
 # the :py:meth:`~pynwb.file.NWBFile.add_acquisition` method.
-#
 
 image_data = np.random.randint(low=0, high=255, size=(200, 50, 50, 3), dtype=np.uint8)
 behavior_images = ImageSeries(
@@ -138,21 +166,21 @@
 # ^^^^^^^^^^^^^^
 #
 # External files (e.g. video files of the behaving animal) can be added to the :py:class:`~pynwb.file.NWBFile`
-# by creating an :py:class:`~pynwb.image.ImageSeries` object using the 
+# by creating an :py:class:`~pynwb.image.ImageSeries` object using the
 # :py:attr:`~pynwb.image.ImageSeries.external_file` attribute that specifies
 # the path to the external file(s) on disk.
 # The file(s) path must be relative to the path of the NWB file.
 # Either ``external_file`` or ``data`` must be specified, but not both.
 #
-# If the sampling rate is constant, use :py:attr:`~pynwb.base.TimeSeries.rate` and 
+# If the sampling rate is constant, use :py:attr:`~pynwb.base.TimeSeries.rate` and
 # :py:attr:`~pynwb.base.TimeSeries.starting_time` to specify time.
 # For irregularly sampled recordings, use :py:attr:`~pynwb.base.TimeSeries.timestamps` to specify time for each sample
 # image.
 #
 # Each external image may contain one or more consecutive frames of the full :py:class:`~pynwb.image.ImageSeries`.
 # The :py:attr:`~pynwb.image.ImageSeries.starting_frame` attribute serves as an index to indicate which frame
 # each file contains.
-# For example, if the ``external_file`` dataset has three paths to files and the first and the second file have 2 
+# For example, if the ``external_file`` dataset has three paths to files and the first and the second file have 2
 # frames, and the third file has 3 frames, then this attribute will have values `[0, 2, 4]`.
 
 external_file = [