Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide default options for chunking of datasets (issue #635) #636

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ehennestad
Copy link
Collaborator

@ehennestad ehennestad commented Nov 23, 2024

Fix #635

Motivation

Provide better chunking options for files in cloud storage

How to test the behavior?

To be determined

Checklist

  • Have you ensured the PR description clearly describes the problem and solutions?
  • Have you checked to ensure that there aren't other open or previously closed Pull Requests for the same change?
  • If this PR fixes an issue, is the first line of the PR description fix #XX where XX is the issue number?

@ehennestad
Copy link
Collaborator Author

ehennestad commented Nov 23, 2024

Some open questions:

  • Should each possible property / dataset name be present in the configuration (chunk_params.json)? Or should there be one default that applies to all datasets, and overwrites would happen on specific datasets?
  • Should each dataset be a candidate for chunking, or only datasets like "data", "timestamps". Any others?
  • Should chunking be specified for datasets that are smaller than the chunk size? I.e chunkSize == maxSize?
  • Should chunk_dimensions be specified for each set of dimension options, similar to how it is done in the nwb schema?

I.e. https://github.com/NeurodataWithoutBorders/nwb-schema/blob/473fcc41e871288767cfb37d83315cca7469b9d1/core/nwb.base.yaml#L100-L110

dims:
    - - num_times
    - - num_times
      - num_DIM2
    - - num_times
      - num_DIM2
      - num_DIM3
    - - num_times
      - num_DIM2
      - num_DIM3
      - num_DIM4

@bendichter

@ehennestad ehennestad changed the title 635 Provide default options for chunking of datasets Provide default options for chunking of datasets (Issue #635) Nov 23, 2024
@ehennestad ehennestad changed the title Provide default options for chunking of datasets (Issue #635) Provide default options for chunking of datasets (issue #635) Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Add default and customizable configuration for dataset chunking
1 participant