Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic Backend Configuration #732

Merged
merged 93 commits into from
Jun 5, 2024
Merged

Basic Backend Configuration #732

merged 93 commits into from
Jun 5, 2024

Conversation

garrettmflynn
Copy link
Member

This PR attempts to show the backend configuration options for a single output NWB file.

Currently running into issues with the SpikeGLX-Phy test pipeline, where the following error is thrown when using an adjusted NeuroConv branch.

[2024-04-05 12:06:50,020] ERROR in app: Exception on /neuroconv/configuration [POST]
Traceback (most recent call last):
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/apis/neuroconv.py", line 135, in post
    return get_backend_configuration(neuroconv_api.payload)
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 676, in get_backend_configuration
    configuration = get_default_backend_configuration(nwbfile=nwbfile, backend=info["backend"])
  File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_backend_configuration.py", line 21, in get_default_backend_configuration
    return BackendConfigurationClass.from_nwbfile(nwbfile=nwbfile)
  File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_backend.py", line 45, in from_nwbfile
    dataset_configurations = {
  File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_backend.py", line 45, in <dictcomp>
    dataset_configurations = {
  File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_dataset_configuration.py", line 127, in get_default_dataset_io_configurations
    dataset_io_configuration = DatasetIOConfigurationClass.from_neurodata_object(
  File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py", line 263, in from_neurodata_object
    raise NotImplementedError(
NotImplementedError: Unable to create a `DatasetIOConfiguration` for the dataset at 'units/unit_name/data'for neurodata object '<hdmf.common.table.VectorData object at 0x286ee2f40>' of type '<class 'hdmf.common.table.VectorData'>'!

@garrettmflynn garrettmflynn self-assigned this Apr 5, 2024
@garrettmflynn
Copy link
Member Author

Are we expecting that the user's choice of backend (i.e. hdf5 vs zarr) will be consistent across files or should we support selection of different backends for individual files?

@garrettmflynn
Copy link
Member Author

How should we handle the below inconsistency between stub and non-stub configurations?

[2024-04-09 17:08:56,487] ERROR in app: Exception on /neuroconv/configuration [POST]
Traceback (most recent call last):
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/apis/neuroconv.py", line 137, in post
    return get_backend_configuration(neuroconv_api.payload)
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 760, in get_backend_configuration
    configuration = create_file(info)
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 746, in create_file
    configure_dataset_backends(nwbfile, backend_configuration, configuration)
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 693, in configure_dataset_backends
    setattr(configuration.dataset_configurations[name], key, value)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/pydantic/main.py", line 792, in __setattr__
    self.__pydantic_validator__.validate_assignment(self, name, value)
pydantic_core._pydantic_core.ValidationError: 1 validation error for HDF5DatasetIOConfiguration
  Value error, Some dimensions of the chunk_shape=[788] exceed the buffer_shape=(159,) for dataset at location 'units/spike_times/data'! [type=value_error, input_value={'object_id': '1e57fe46-d...pression_options': None}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/value_error

@CodyCBakerPhD
Copy link
Collaborator

Are we expecting that the user's choice of backend (i.e. hdf5 vs zarr) will be consistent across files or should we support selection of different backends for individual files?

Same as everything else - one file per session, let the session manager allow individual file specification, but allow a global default (it would generally be odd to have a mix of backends, but there might be a reason to allow it if there's a special backend-specific compression method they want to use for a data stream that is only available in some subset of sessions - but in general, no reason to force all sessions to be the same)

@CodyCBakerPhD
Copy link
Collaborator

How should we handle the below inconsistency between stub and non-stub configurations?

Thinking more on it, there isn't much use to apply the backend configuration to a stub file. Neurosift ought to be agnostic to such things aside from slight performance gains

@CodyCBakerPhD
Copy link
Collaborator

Some stylistic requests from the first pass through (which felt pretty good once I got it working BTW)

a) Should probably suppress the 'Zarr' backend option for the time being (but perhaps leave commented so easy to re enable in future)

Reasons include

  • DANDI technically doesn't officially take NWB Zarr files yet (anything on that front is still experimental)
  • NeuroConv doesn't have full tests running with Zarr

I know that leaves a dropdown with only a single element, though, so if you'd rather remove the dropdown altogether that might also be OK (and just hardcode 'hdf5' in the payload for now)

b) When displaying summary information about the source array, such as

image

I think it would look better to separate the axis lengths by a space and an x, e.g., 1155 x 384

This is how HDFView does it as well

Also, on that same summary, can you use the human readable auto sizing on the source array size (so that one would be on MB scale)?

c) Possibly a bug though not sure on what level; if I do as the description says and leave the compression method blank, it still defaults to GZIP (also if I navigate back to this page after proceeding) so I was unable to disable compression

@garrettmflynn
Copy link
Member Author

All of these should be fixed!

Just as context for the gzip option, it looks like I'm getting a schema with the default compression_method as gzip—so I've just had to remove this, as I'll set to the schema default.

@CodyCBakerPhD
Copy link
Collaborator

Yeah, it's one of those things...

gzip is definitely the initial value that should fill into all the options 'by default', but blank is allowed to equivocate to the Python side of None, meaning no compression

Though I guess it might be even better to add an explicit option in the dropdown for 'No compression' - would you be able to inject that to the schema and map it to mean null/None when passing back to NeuroConv?

@garrettmflynn
Copy link
Member Author

There's no inconsistency as far as I've seen, so the described behavior is what's happening now.

I can do this if you'd like, but not 100% sure what the time estimate would be on it—so we might be better off with it as a follow-up.

@CodyCBakerPhD
Copy link
Collaborator

Sure, a follow-up for 'no compression' if we have time later

But for now you have a way to auto-populate gzip but have 'default' (if I remove text then click off) set to blank?

@garrettmflynn
Copy link
Member Author

Yeah that's just what happens since there isn't a default or global value. gzip seems to just be passed as the compression method value for every configuration level.

@garrettmflynn garrettmflynn marked this pull request as ready for review June 5, 2024 16:22
@CodyCBakerPhD
Copy link
Collaborator

Looks great!

@CodyCBakerPhD CodyCBakerPhD enabled auto-merge June 5, 2024 16:48
@CodyCBakerPhD CodyCBakerPhD merged commit ac2285b into main Jun 5, 2024
21 of 22 checks passed
@CodyCBakerPhD CodyCBakerPhD deleted the backend-configuration branch June 5, 2024 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants