Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow compression in exported h5 files #198

Open
jzaremba opened this issue Sep 13, 2015 · 8 comments
Open

Allow compression in exported h5 files #198

jzaremba opened this issue Sep 13, 2015 · 8 comments
Assignees
Milestone

Comments

@jzaremba
Copy link
Member

As noted in #195 by @vjlbym, exporting data in the h5 format results in a very large file.

This can be addressed by exposing more of the h5py dataset creation options, in particular the 'compression' argument.

At the very least we should add a 'compression' argument to export_frames, but could also add a h5_kwargs argument that takes a dictionary of argument passed to h5py.create_dataset.

@jzaremba jzaremba self-assigned this Sep 13, 2015
@jzaremba jzaremba added this to the SIMA 1.x milestone Sep 13, 2015
@vjlbym
Copy link
Contributor

vjlbym commented Sep 14, 2015

While you guys are working on this, could you please tell me if there is an easy way for me to export hdf5 in a 16bit format instead of 32 bit? I tried looking around the SIMA code to see where this is specified but haven't yet found it. Thanks!

@jzaremba
Copy link
Member Author

I should be able to get something up in the next couple of days, but until then here's the lines that would need to be changed, both in sequence.py.

The type of the exported h5 file is defined here:
https://github.com/losonczylab/sima/blob/master/sima/sequence.py#L435

Changing that to 'float16' should fix that, or alternatively, you can tell h5py to compress the file by adding the keyword argument, compression='gzip', here:
https://github.com/losonczylab/sima/blob/master/sima/sequence.py#L459

In order to edit the code, you'll need to clone the repo if you haven't already (git clone https://github.com/losonczylab/sima.git), make those changes and then install by running 'python setup.py develop'.

@vjlbym
Copy link
Contributor

vjlbym commented Sep 14, 2015

Perfect! Thanks!

@vjlbym
Copy link
Contributor

vjlbym commented Sep 14, 2015

I think I have an old version kicking around on my computer since on mine, the dtype was set to uint16. That's why I didn't find it. I haven't used git much. Is it super stupid to just modify the installed files on the computer temporarily to get this going? Thanks again!

@jzaremba
Copy link
Member Author

The correct answer is probably that it's a bad idea, but that should work fine. Just make sure to save a copy of the original file and you might need to delete/move the corresponding .pyc file.

jzaremba added a commit to jzaremba/sima that referenced this issue Sep 15, 2015
…osonczylab#198).

Rewrote Sequence.export for 'HDF5' format to make it more memory-efficient, should now only load 1 frame at a time.
Added test cases for exporting sequences.
@vjlbym
Copy link
Contributor

vjlbym commented Sep 15, 2015

Hey @jzaremba , Can I request one last thing? Could you please allow for the bit depth as an optional parameter in the h5py export? I feel that if the original video is in 16-bit tiff, there is no additional benefit in casting it as a float32, while the file size is doubled without compression. So, it would be great if the h5py bit depth is either set to be the same as the original data, or if it can be specified as an optional argument. Thanks a ton again! You guys have been tremendously helpful!

@jzaremba
Copy link
Member Author

Yeah, we've had a TODO note there for a while. I don't see a reason not to keep the same dtype as the Sequence that is being exported, I can add that in. I'd hesitate to add an bit depth option to the export method, just in interest of keeping the API clean.

Is your data otherwise maintaining the correct dtype throughout the pipeline? Is a single frame uint16 or float16?

@jzaremba jzaremba reopened this Sep 16, 2015
@vjlbym
Copy link
Contributor

vjlbym commented Sep 18, 2015

Hey sorry for the delay in response. Didn't see this post. Yeah, additional option is absolutely not required. I just suggested it in case keeping the same dtype is somehow more involved. Thanks a ton! Yeah, the data is otherwise maintaining the same dtype as far as I can tell. Single frame is unint16. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants