-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow compression in exported h5 files #198
Comments
While you guys are working on this, could you please tell me if there is an easy way for me to export hdf5 in a 16bit format instead of 32 bit? I tried looking around the SIMA code to see where this is specified but haven't yet found it. Thanks! |
I should be able to get something up in the next couple of days, but until then here's the lines that would need to be changed, both in sequence.py. The type of the exported h5 file is defined here: Changing that to 'float16' should fix that, or alternatively, you can tell h5py to compress the file by adding the keyword argument, compression='gzip', here: In order to edit the code, you'll need to clone the repo if you haven't already (git clone https://github.com/losonczylab/sima.git), make those changes and then install by running 'python setup.py develop'. |
Perfect! Thanks! |
I think I have an old version kicking around on my computer since on mine, the dtype was set to uint16. That's why I didn't find it. I haven't used git much. Is it super stupid to just modify the installed files on the computer temporarily to get this going? Thanks again! |
The correct answer is probably that it's a bad idea, but that should work fine. Just make sure to save a copy of the original file and you might need to delete/move the corresponding .pyc file. |
…osonczylab#198). Rewrote Sequence.export for 'HDF5' format to make it more memory-efficient, should now only load 1 frame at a time. Added test cases for exporting sequences.
Hey @jzaremba , Can I request one last thing? Could you please allow for the bit depth as an optional parameter in the h5py export? I feel that if the original video is in 16-bit tiff, there is no additional benefit in casting it as a float32, while the file size is doubled without compression. So, it would be great if the h5py bit depth is either set to be the same as the original data, or if it can be specified as an optional argument. Thanks a ton again! You guys have been tremendously helpful! |
Yeah, we've had a TODO note there for a while. I don't see a reason not to keep the same dtype as the Sequence that is being exported, I can add that in. I'd hesitate to add an bit depth option to the export method, just in interest of keeping the API clean. Is your data otherwise maintaining the correct dtype throughout the pipeline? Is a single frame uint16 or float16? |
Hey sorry for the delay in response. Didn't see this post. Yeah, additional option is absolutely not required. I just suggested it in case keeping the same dtype is somehow more involved. Thanks a ton! Yeah, the data is otherwise maintaining the same dtype as far as I can tell. Single frame is unint16. Thanks! |
As noted in #195 by @vjlbym, exporting data in the h5 format results in a very large file.
This can be addressed by exposing more of the h5py dataset creation options, in particular the 'compression' argument.
At the very least we should add a 'compression' argument to export_frames, but could also add a h5_kwargs argument that takes a dictionary of argument passed to h5py.create_dataset.
The text was updated successfully, but these errors were encountered: