-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fix multi worker and pip installed hdf5plugin #15
base: main
Are you sure you want to change the base?
Conversation
- dataset/sequence.py - fix the issues surrounding single file descriptor being inherited by the child processes. We wait to open the hdf5 files until we know we are in process accessing individual items - pip_blosc_fix.py - This file provides a potential fix for h5py not scraping the correct directory for the plugins. By default it attempts to search in the hdf5 default (but this may not exist).
Hi @k-chaney Thanks for your contribution! Could you tell me more about those issues:
|
For context, I have been installing packages through pip as conda is slow for my purposes. For 1, I get this error when I install through pip and use your code as is:
This led me down the rabbit hole of figuring out how hdf5 handles plugins (and the environment that handles this). However, upon more poking and prodding to reproduce my solution might have been overly complex. It looks like the minimal fix is just:
This could be added into your code directly (as it shouldn't have side effects). I did a quick grep of the library code and it appears as though you were relying upon hdf5 to automatically grab the plugin. This works inside the conda environment, but not a pip environment. With this fixed it led me to the next portion. For 2, these are the errors that I saw when I go through a pip installation and have more than 1 worker. Note that this doesn't happen with a conda install.
In my experience with hdf5 (I was in charge of converting MVSEC), these sorts of errors are related to having the same file descriptor being shared between processes. The solution to this is simply to open the hdf5 files from within the child process (i.e. the getitem function) I will do more digging to see what the difference in the installations are. On the surface they seem very similar, but more digging will most likely result in the reason that conda works out of the box and pip does not. |
Very interesting, thanks. I think then that it would make sense that I adapt the documentation for the pip installation. In case of the code, I believe that it is sufficient to catch the import error of the hdf5plugin and inform the user that install the hdf5plugin is required for a pip installation but otherwise not. E.g. try:
import hdf5plugin
except ImportError:
print("Install the hdf5plugin if you are using pip instead of conda: https://pypi.org/project/hdf5plugin/") |
Hi @k-chaney - I just came across this issue. Have you tried using https://github.com/mamba-org/mamba which is a fast drop-in replacement for conda? |
Just quick check-in to share my experience:
I agree with the second issue, opening the same h5 file between processes is troublesome (like this stackoverflow ). The quick fix is use
However for the first issue, pip is working perfectly with hdf5plugin in my environment. I use:
I'd recommend to use venv if your pip has some problem and if you use the system python. Shintaro |
I spent some time digging into the issues surrounding two primary problems:
h5py not scraping the correct directory
for the plugins. By default it attempts
to search in the hdf5 default (but this
may not exist).
descriptor being inherited by the child
processes. We wait to open the hdf5 files
until we know we are in process accessing
individual items