Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression options for saving output. #43

Open
3 tasks
YooSunYoung opened this issue Feb 21, 2024 · 4 comments
Open
3 tasks

Compression options for saving output. #43

YooSunYoung opened this issue Feb 21, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@YooSunYoung
Copy link
Member

NMXReducedData keeps pixel id coordinate and there are >1e6 pixels
so it could carry relatively large data (>GB) in the end.

Therefore we would like to compress the large-size Dataset when we save it into h5 or nexus file.
The compression option is now hard-coded, but it makes saving very slow.
Some advanced users, i.e. IDS might want to turn it off when they have to debug the workflow.

Therefore we should

  • Find the optimal compression options comparing the advantages between speed and size of the output.
  • Make it possible to lower the compression option, with the optimal compression by default.
  • Document about compression in the workflow user-guide.
@YooSunYoung YooSunYoung added the enhancement New feature or request label Feb 21, 2024
@SimonHeybrock
Copy link
Member

HDF5 compression / decompression is really slow. We should imho never use it.

@YooSunYoung
Copy link
Member Author

Hmm but @Justin-Bergmann requested to include this for saving the data.
Is there an alternative to reduce the size of the files...?

@SimonHeybrock
Copy link
Member

Is the problem that this is stored for every wavelength slice?

How about storing all slices in the same file, then the pixels ids can be reused?

@YooSunYoung
Copy link
Member Author

Is the problem that this is stored for every wavelength slice?

How about storing all slices in the same file, then the pixels ids can be reused?

I think we are already doing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants