Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remfile and file-like in nwb rec extractor #2169

Merged
merged 16 commits into from
Nov 27, 2023
Merged

Support remfile and file-like in nwb rec extractor #2169

merged 16 commits into from
Nov 27, 2023

Conversation

magland
Copy link
Member

@magland magland commented Nov 4, 2023

Adds "remfile" stream mode option for NwbRecordingExtractor.

Also allows passing in a file-like object. This is important for dendro for embargoed datasets because we need to pass in a file-like object that is capable of renewing its remote url periodically as the presigned download url expires.

In terms of parameter order, I put the new parameter "file" after electrical_series_name because somebody may be calling this without naming the arguments (since * is not used in the constructor definition).

This is untested (are there unit tests for nwb extractor?)

@alejoe91 @samuelgarcia @luiztauffer

@magland magland marked this pull request as draft November 4, 2023 18:47
@alejoe91
Copy link
Member

alejoe91 commented Nov 4, 2023

@magland
Copy link
Member Author

magland commented Nov 4, 2023

Thanks @alejoe91 I added some tests. Let's see if they pass!

Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add file to the kwargs for pickling to work.

Concerning this:

Also allows passing in a file-like object. This is important for dendro for embargoed datasets because we need to pass in a file-like object that is capable of renewing its remote url periodically as the presigned download url expires.

I am curios, why can't you get away by just re-initializing the object when the path changes.

# Condition / loop here to see if the path did not change
try:
    file_path = (
        "https://dandi-api-staging-dandisets.s3.amazonaws.com/blobs/5f4/b7a/5f4b7a1f-7b95-4ad8-9579-4df6025371cc"
    )
    file = remfile.File(file_path)
# File path should correct here    
    rec = NwbRecordingExtractor(file_path=file_path)

In other words, why just modifying the url (the file_path) is not enough?

@alejoe91 alejoe91 added the extractors Related to extractors module label Nov 5, 2023
@alejoe91 alejoe91 changed the title support remfile and file-like in nwb rec extractor Support remfile and file-like in nwb rec extractor Nov 6, 2023
@magland
Copy link
Member Author

magland commented Nov 6, 2023

Thanks @h-mayorquin

I am curios, why can't you get away by just re-initializing the object when the path changes.
In other words, why just modifying the url (the file_path) is not enough?

It's because the recording object needs to be passed into other functions which may take a long time to execute. For example

f = some_file_object_that_renews_the_url()
rec = NwbRecordingExtractor(file=f, ...)
some_package.some_long_analysis(recording=rec, ...)

where we don't have control over some_long_analysis.

@samuelgarcia
Copy link
Member

Merci beaucoup!

if stream_mode == "fsspec":
# only add stream_cache_path to kwargs if it was passed as an argument
if stream_cache_path is not None:
stream_cache_path = str(Path(self.stream_cache_path).absolute())

self.extra_requirements.extend(["pandas", "pynwb", "hdmf"])
self._electrical_series = electrical_series

# set serializability bools
# TODO: correct spelling of self._serializablility throughout SI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooops

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solved here #2238

Copy link
Collaborator

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a couple docstring changes/clarifications. Feel free to use if you want.

src/spikeinterface/extractors/nwbextractors.py Outdated Show resolved Hide resolved
src/spikeinterface/extractors/nwbextractors.py Outdated Show resolved Hide resolved
src/spikeinterface/extractors/nwbextractors.py Outdated Show resolved Hide resolved
@magland
Copy link
Member Author

magland commented Nov 8, 2023

I just added a couple docstring changes/clarifications. Feel free to use if you want.

Thanks, I accepted those commits. I wonder if we want to squash before merge. Maybe you don't care about large number of commits, IDK.

@zm711
Copy link
Collaborator

zm711 commented Nov 8, 2023

Doesn't matter to me, whatever @alejoe91 and @samuelgarcia want when they merge.

@h-mayorquin
Copy link
Collaborator

@magland
Thanks for explaining and adding the tests. This looks good to me.

I also will like to highlight your comment about our lack of division between keyword and keyword only arguments. I think it will be more flexible in the future and avoid sub-optimal ordering as you have pointed out.

@magland
Copy link
Member Author

magland commented Nov 9, 2023

I also will like to highlight your comment about our lack of division between keyword and keyword only arguments. I think it will be more flexible in the future and avoid sub-optimal ordering as you have pointed out.

Yes it's a potentially big problem if users do not use named kw args in their scripts, and then you want to add additional parameters for a function, which may break their code, unless you add them at the end; but it's not always desirable to do that. I recommend that * is used throughout except for possibly the first arg (and in rare instances the second as well).

@zm711
Copy link
Collaborator

zm711 commented Nov 10, 2023

Yes it's a potentially big problem if users do not use named kw args in their scripts, and then you want to add additional parameters for a function, which may break their code, unless you add them at the end

This came up in #1800 when parameter order was swapped. Just to link an example of when this caused a problem. And in support of using keyword arguments.

@alejoe91 alejoe91 marked this pull request as ready for review November 15, 2023 16:11
@alejoe91
Copy link
Member

@magland is this ready to merge?

@magland
Copy link
Member Author

magland commented Nov 22, 2023

@magland is this ready to merge?

@alejoe91 I believe so. But I am not using it yet, since I am working off of the released version of SI.

@h-mayorquin
Copy link
Collaborator

I can actually give it a try this afternoon as I need to do some reading from dandi.

@h-mayorquin
Copy link
Collaborator

Hi, I tried to test this yesterday but got a couple of bugs. I will fix the others as they are related to spikeinterface but there are some files I can't load with remfile:

For asset path:
'sub-PL015/sub-PL015_ses-1d4a7bd6-296a-48b9-b20e-bd0ac80750a5_behavior+ecephys+image.nwb'

Of dandiset id:
000409

The ulr of the blob:
https://dandiarchive.s3.amazonaws.com/blobs/413/cf0/413cf0f3-3498-485a-b099-84bc36d43ca6

This works:

import fsspec
import h5py
from fsspec.implementations.cached import CachingFileSystem
from pathlib import Path

# URL of the file
file_url = 'https://dandiarchive.s3.amazonaws.com/blobs/413/cf0/413cf0f3-3498-485a-b099-84bc36d43ca6'


fs = fsspec.filesystem("http", )
fsspec_file = fs.open(file_url, "rb")


# Use h5py to open the cached file
file = h5py.File(fsspec_file, 'r') 

nwbfile = NWBHDF5IO(file=file, mode='r', load_namespaces=True).read()
nwbfile

But this does not :

import remfile

import fsspec
import h5py
from fsspec.implementations.cached import CachingFileSystem
from pathlib import Path

# URL of the file
file_url = 'https://dandiarchive.s3.amazonaws.com/blobs/413/cf0/413cf0f3-3498-485a-b099-84bc36d43ca6'

import remfile
import h5py
from pynwb import NWBHDF5IO

# Open the file with fsspec
rfile = remfile.File(file_url)

file = h5py.File(rfile, 'r')

nwbfile = NWBHDF5IO(file, 'r', load_namespaces=True).read()

The output is:

{
	"name": "OSError",
	"message": "Unable to open file (file signature not found)",
	"stack": "---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
/home/heberto/development/spikeinterface/bin/dev_notebook.ipynb Cell 35 line 1
     <a href='vscode-notebook-cell:/home/heberto/development/spikeinterface/bin/dev_notebook.ipynb#X25sZmlsZQ%3D%3D?line=13'>14</a> cached_file = caching_file_system.open(path=file_url, mode=\"rb\")
     <a href='vscode-notebook-cell:/home/heberto/development/spikeinterface/bin/dev_notebook.ipynb#X25sZmlsZQ%3D%3D?line=14'>15</a> # Use h5py to open the cached file
---> <a href='vscode-notebook-cell:/home/heberto/development/spikeinterface/bin/dev_notebook.ipynb#X25sZmlsZQ%3D%3D?line=16'>17</a> file = h5py.File(cached_file, 'r') 
     <a href='vscode-notebook-cell:/home/heberto/development/spikeinterface/bin/dev_notebook.ipynb#X25sZmlsZQ%3D%3D?line=18'>19</a> nwbfile = NWBHDF5IO(file=file, mode='r', load_namespaces=True).read()
     <a href='vscode-notebook-cell:/home/heberto/development/spikeinterface/bin/dev_notebook.ipynb#X25sZmlsZQ%3D%3D?line=19'>20</a> nwbfile

File ~/miniconda3/envs/neuroconv_env/lib/python3.10/site-packages/h5py/_hl/files.py:562, in File.__init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, alignment_threshold, alignment_interval, meta_block_size, **kwds)
    553     fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0,
    554                      locking, page_buf_size, min_meta_keep, min_raw_keep,
    555                      alignment_threshold=alignment_threshold,
    556                      alignment_interval=alignment_interval,
    557                      meta_block_size=meta_block_size,
    558                      **kwds)
    559     fcpl = make_fcpl(track_order=track_order, fs_strategy=fs_strategy,
    560                      fs_persist=fs_persist, fs_threshold=fs_threshold,
    561                      fs_page_size=fs_page_size)
--> 562     fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
    564 if isinstance(libver, tuple):
    565     self._libver = libver

File ~/miniconda3/envs/neuroconv_env/lib/python3.10/site-packages/h5py/_hl/files.py:235, in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    233     if swmr and swmr_support:
    234         flags |= h5f.ACC_SWMR_READ
--> 235     fid = h5f.open(name, flags, fapl=fapl)
    236 elif mode == 'r+':
    237     fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5f.pyx:106, in h5py.h5f.open()

OSError: Unable to open file (file signature not found)"
}

Note that remfile shares this problem fsspec when the caching file is used so that might offer a clue. That is, the following produces a very similar error:

import fsspec
import h5py
from fsspec.implementations.cached import CachingFileSystem
from pynwb import NWBHDF5IO

file_url = 'https://dandiarchive.s3.amazonaws.com/blobs/413/cf0/413cf0f3-3498-485a-b099-84bc36d43ca6'


caching_file_system = CachingFileSystem(
    fs=fsspec.filesystem("http", ),
    cache_storage=str(Path.cwd()),
)

cached_file = caching_file_system.open(path=file_url, mode="rb")
# Use h5py to open the cached file

file = h5py.File(cached_file, 'r') 

nwbfile = NWBHDF5IO(file=file, mode='r', load_namespaces=True).read()
nwbfile

I am using h5py 3.10.0 and the latest version of request.
I tried a couple of different versions of h5py to no avail.

@magland
Copy link
Member Author

magland commented Nov 23, 2023

@h-mayorquin

I tried replacing

nwbfile = NWBHDF5IO(file, 'r', load_namespaces=True).read()

in your script with

nwbfile = NWBHDF5IO(file=file, mode='r', load_namespaces=True).read()

and it seemed to work.

Then I did the following timing test

import remfile
import time

import fsspec
import h5py
from pynwb import NWBHDF5IO

# URL of the file
file_url = 'https://dandiarchive.s3.amazonaws.com/blobs/413/cf0/413cf0f3-3498-485a-b099-84bc36d43ca6'

for mode in ['fsspec', 'remfile']:
    if mode == 'fsspec':
        print('fspec mode.......')
        timer = time.time()
        fs = fsspec.filesystem("http", )
        fsspec_file = fs.open(file_url, "rb")

        # Use h5py to open the cached file
        file = h5py.File(fsspec_file, 'r')

        print(file.keys())

        nwbfile = NWBHDF5IO(file=file, mode='r', load_namespaces=True).read()
        print(nwbfile)

        elapsed_sec = time.time() - timer
        print(f'Elapsed time for fsspec mode: {elapsed_sec:.2f} sec')
    elif mode == 'remfile':
        print('remfile mode.......')
        timer = time.time()
        rfile = remfile.File(file_url, verbose=True)

        file = h5py.File(rfile, 'r')

        print(file.keys())

        nwbfile = NWBHDF5IO(file=file, mode='r', load_namespaces=True).read()
        print(nwbfile)

        elapsed_sec = time.time() - timer
        print(f'Elapsed time for remfile mode: {elapsed_sec:.2f} sec')

And I got

fspec mode.......
<KeysViewHDF5 ['acquisition', 'analysis', 'file_create_date', 'general', 'identifier', 'intervals', 'processing', 'session_description', 'session_start_time', 'specifications', 'stimulus', 'timestamps_reference_time', 'units']>
/home/magland/miniconda3/envs/dev/lib/python3.8/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.5.1 because version 1.8.0 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
/home/magland/miniconda3/envs/dev/lib/python3.8/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'core' version 2.5.0 because version 2.6.0-alpha is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
/home/magland/miniconda3/envs/dev/lib/python3.8/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-experimental' version 0.2.0 because version 0.5.0 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
root pynwb.file.NWBFile at 0x140179901711552
Fields:
  acquisition: {
    ElectricalSeriesAp <class 'pynwb.ecephys.ElectricalSeries'>,
    ElectricalSeriesLf <class 'pynwb.ecephys.ElectricalSeries'>,
    OriginalVideoBodyCamera <class 'pynwb.image.ImageSeries'>,
    OriginalVideoLeftCamera <class 'pynwb.image.ImageSeries'>,
    OriginalVideoRightCamera <class 'pynwb.image.ImageSeries'>
  }
  devices: {
    NeuropixelsProbe <class 'pynwb.device.Device'>
  }
  electrode_groups: {
    NeuropixelsShank <class 'pynwb.ecephys.ElectrodeGroup'>
  }
  electrodes: electrodes <class 'hdmf.common.table.DynamicTable'>
  experiment_description: IBL aims to understand the neural basis of decision-making in the mouse by gathering a whole-brain activity map composed of electrophysiological recordings pooled from multiple laboratories. We have systematically recorded from nearly all major brain areas with Neuropixels probes, using a grid system for unbiased sampling and replicating each recording site in at least two laboratories. These data have been used to construct a brain-wide map of activity at single-spike cellular resolution during a decision-making task. In addition to the map, this data set contains other information gathered during the task: sensory stimuli presented to the mouse; mouse decisions and response times; and mouse pose information from video recordings and DeepLabCut analysis.

  file_create_date: [datetime.datetime(2023, 2, 17, 3, 15, 23, 896756, tzinfo=tzutc())]
  identifier: c33e2740-5475-463e-bd16-d1c38da37463
  institution: University College London
  intervals: {
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  lab: Hausser
  processing: {
    behavior <class 'pynwb.base.ProcessingModule'>
  }
  protocol: _iblrig_tasks_ephysChoiceWorld6.6.1
  related_publications: ['https://doi.org/10.6084/m9.figshare.21400815.v6, https://doi.org/10.1101/2020.01.17.909838']
  session_description: The full description of the session/task protocol can be found in Appendix 2 of International Brain Laboratory, et al. "Standardized and reproducible measurement of decision-making in mice." Elife 10 (2021); e63711.

  session_id: 1d4a7bd6-296a-48b9-b20e-bd0ac80750a5
  session_start_time: 2022-07-21 16:08:53.428769+01:00
  subject: subject abc.IblSubject at 0x140179901711936
Fields:
  date_of_birth: 2021-05-15 00:00:00+01:00
  description: Mice were housed under a 12/12 h light/dark cycle (normal or inverted depending on the laboratory) with food and water 112 available ad libitum, except during behavioural training days. Electrophysiological recordings and behavioural training were performed during either the dark or light phase of the subject cycle depending on the laboratory. Subjects were obtained from either the Jackson Laboratory or Charles River.

  expected_water_ml: 1.1400000000000001
  last_water_restriction: 2021-06-01T17:42:13
  remaining_water_ml: 1.1400000000000001
  sex: M
  species: Mus musculus
  strain: C57BL/6
  subject_id: PL015
  url: https://openalyx.internationalbrainlab.org/subjects/PL015
  weight: 0.030100000000000002 kg

  timestamps_reference_time: 2022-07-21 16:08:53.428769+01:00
  trials: trials <class 'pynwb.epoch.TimeIntervals'>
  units: units <class 'pynwb.misc.Units'>

Elapsed time for fsspec mode: 393.38 sec
remfile mode.......
Loading 1 chunks starting at 0 (0.1024 million bytes)
Loading 1 chunks starting at 455542 (0.1024 million bytes)
Loading 1 chunks starting at 455838 (0.1024 million bytes)
<KeysViewHDF5 ['acquisition', 'analysis', 'file_create_date', 'general', 'identifier', 'intervals', 'processing', 'session_description', 'session_start_time', 'specifications', 'stimulus', 'timestamps_reference_time', 'units']>
Loading 2 chunks starting at 455839 (0.2048 million bytes)
Loading 2 chunks starting at 455643 (0.2048 million bytes)
...
root pynwb.file.NWBFile at 0x140179901580528
Fields:
  acquisition: {
    ElectricalSeriesAp <class 'pynwb.ecephys.ElectricalSeries'>,
    ElectricalSeriesLf <class 'pynwb.ecephys.ElectricalSeries'>,
    OriginalVideoBodyCamera <class 'pynwb.image.ImageSeries'>,
    OriginalVideoLeftCamera <class 'pynwb.image.ImageSeries'>,
    OriginalVideoRightCamera <class 'pynwb.image.ImageSeries'>
  }
  devices: {
    NeuropixelsProbe <class 'pynwb.device.Device'>
  }
  electrode_groups: {
    NeuropixelsShank <class 'pynwb.ecephys.ElectrodeGroup'>
  }
  electrodes: electrodes <class 'hdmf.common.table.DynamicTable'>
  experiment_description: IBL aims to understand the neural basis of decision-making in the mouse by gathering a whole-brain activity map composed of electrophysiological recordings pooled from multiple laboratories. We have systematically recorded from nearly all major brain areas with Neuropixels probes, using a grid system for unbiased sampling and replicating each recording site in at least two laboratories. These data have been used to construct a brain-wide map of activity at single-spike cellular resolution during a decision-making task. In addition to the map, this data set contains other information gathered during the task: sensory stimuli presented to the mouse; mouse decisions and response times; and mouse pose information from video recordings and DeepLabCut analysis.

  file_create_date: [datetime.datetime(2023, 2, 17, 3, 15, 23, 896756, tzinfo=tzutc())]
  identifier: c33e2740-5475-463e-bd16-d1c38da37463
  institution: University College London
  intervals: {
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  lab: Hausser
  processing: {
    behavior <class 'pynwb.base.ProcessingModule'>
  }
  protocol: _iblrig_tasks_ephysChoiceWorld6.6.1
  related_publications: ['https://doi.org/10.6084/m9.figshare.21400815.v6, https://doi.org/10.1101/2020.01.17.909838']
  session_description: The full description of the session/task protocol can be found in Appendix 2 of International Brain Laboratory, et al. "Standardized and reproducible measurement of decision-making in mice." Elife 10 (2021); e63711.

  session_id: 1d4a7bd6-296a-48b9-b20e-bd0ac80750a5
  session_start_time: 2022-07-21 16:08:53.428769+01:00
  subject: subject abc.IblSubject at 0x140179904705152
Fields:
  date_of_birth: 2021-05-15 00:00:00+01:00
  description: Mice were housed under a 12/12 h light/dark cycle (normal or inverted depending on the laboratory) with food and water 112 available ad libitum, except during behavioural training days. Electrophysiological recordings and behavioural training were performed during either the dark or light phase of the subject cycle depending on the laboratory. Subjects were obtained from either the Jackson Laboratory or Charles River.

  expected_water_ml: 1.1400000000000001
  last_water_restriction: 2021-06-01T17:42:13
  remaining_water_ml: 1.1400000000000001
  sex: M
  species: Mus musculus
  strain: C57BL/6
  subject_id: PL015
  url: https://openalyx.internationalbrainlab.org/subjects/PL015
  weight: 0.030100000000000002 kg

  timestamps_reference_time: 2022-07-21 16:08:53.428769+01:00
  trials: trials <class 'pynwb.epoch.TimeIntervals'>
  units: units <class 'pynwb.misc.Units'>

Elapsed time for remfile mode: 44.25 sec

So the difference in timing is 393 sec for fsspec vs 44 sec for remfile!

I'll also note that the pynwb overhead is significant. You can begin to read the h5py file after just a second or two. That's why the nwb recording extractors I am using do not use pynwb at all, but go directly for the electrical series objects in the h5py file. I think in a separate PR we should explore using that method for the SI nwb recording extractor.

@alejoe91

@h-mayorquin
Copy link
Collaborator

@magland
Thanks, yes, I guess the it is only the caching system that is failing (see under). I have encountered an erro before passing the file before as a positional argument which was interpreted as a file_path (and generated the obscure error). Then I got the following error with the caching system and wrongly assumed it was the same with the remfile. Thanks.

import fsspec
import h5py
from fsspec.implementations.cached import CachingFileSystem
from pynwb import NWBHDF5IO

file_url = 'https://dandiarchive.s3.amazonaws.com/blobs/413/cf0/413cf0f3-3498-485a-b099-84bc36d43ca6'


caching_file_system = CachingFileSystem(
    fs=fsspec.filesystem("http", ),
    cache_storage=str(Path.cwd()),
)

cached_file = caching_file_system.open(path=file_url, mode="rb")
# Use h5py to open the cached file

file = h5py.File(cached_file, 'r') 

Can you point to the extractors that you are using?

@magland
Copy link
Member Author

magland commented Nov 23, 2023

Can you point to the extractors that you are using?

https://github.com/scratchrealm/pc-spike-sorting/blob/main/common/NwbRecording.py

It doesn't have all the features of the SI version, but it has the ones I need for spike sorting.

@h-mayorquin
Copy link
Collaborator

Thanks a bunch, I will check them out.
The only problem that I see witth that is that at some point nwbfile might be zarr as well... But I will probably be using something like this soon as I need to analyze a bunch of files from dandi and the nwb does add quite an overhead as you were saying.

@samuelgarcia
Copy link
Member

Thanks a lot Jeremy and Ramon for figthing with this NWB streaming.
I would vote to integrate the NWB of Jeremy in spikeinterface with another name and make an option in read_nwb() to use the legacy or the fast nwb reader.
I think this could be done quickly.

@h-mayorquin
Copy link
Collaborator

@samuelgarcia

Yes, I think we should merge. I will also add this to the nwb sorting extractor in another PR.
I measured the same file than Jemery and I get similar results. I also tested against using a s3 protocol which is better than plain fsspec but not much better. The network usage profile just massively favours remfile as well:

-----------------------------------
fsspec mode.......
<KeysViewHDF5 ['acquisition', 'analysis', 'file_create_date', 'general', 'identifier', 'intervals', 'processing', 'session_description', 'session_start_time', 'specifications', 'stimulus', 'timestamps_reference_time', 'units']>
Elapsed time for fsspec mode: 9.12 sec
Bytes Sent: 215,288, Bytes Received: 22,438,997
-----------------------------------
s3 mode.......
<KeysViewHDF5 ['acquisition', 'analysis', 'file_create_date', 'general', 'identifier', 'intervals', 'processing', 'session_description', 'session_start_time', 'specifications', 'stimulus', 'timestamps_reference_time', 'units']>
Elapsed time for s3 mode: 8.73 sec
Bytes Sent: 227,577, Bytes Received: 22,434,127
-----------------------------------
remfile mode.......
Loading 1 chunks starting at 0 (0.1024 million bytes)
Loading 1 chunks starting at 455542 (0.1024 million bytes)
Loading 1 chunks starting at 455838 (0.1024 million bytes)
<KeysViewHDF5 ['acquisition', 'analysis', 'file_create_date', 'general', 'identifier', 'intervals', 'processing', 'session_description', 'session_start_time', 'specifications', 'stimulus', 'timestamps_reference_time', 'units']>
Elapsed time for remfile mode: 3.25 sec
Bytes Sent: 29,851, Bytes Received: 395,810

Reading the NWB in fact clogs my system, I will dig deeper into this at some point and report it to the core team but we will probably need to do something along the lines of what Jeremy for his extractors for the Templates project. It just can't be done otherwise : /

Check out the network profile, the last small chunk is the remfile:
image

I will keep playing with this.

@magland
Copy link
Member Author

magland commented Nov 24, 2023

Wow thanks @h-mayorquin ! The network activity plot is pretty convincing.

I just created an issue on pynwb to update their docs to suggest remfile as the preferred method.

@alejoe91 alejoe91 merged commit ee88ef3 into SpikeInterface:main Nov 27, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extractors Related to extractors module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants