-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to support parallel NetCDF I/O? #122
Comments
Support for parallel NetCDF I/O would indeed be nice. But this fails with:
It could be that I did something stupid. If don't know much on this subject either, but I somebody could provide a full C example (as opposed to code fragments) that would help a lot. |
Thanks for looking into this. I thought I was going to start playing around with parallel I/O sooner but still working on basic MPI infrastructure... I can try getting |
Any luck with this @ali-ramadhan? netcdf could also be a sink for parallel DiskArrays/Dagger.jl processing, so this would be widely useful. |
call for this feature too |
👀 |
I'm just chiming in to let interested people know that I've been working on this task during the last week or so. I managed to produce a working example on my laptop over the weekend. My plan is to consolidate the code changes first and then write and execute a few meaningful tests on real HPC platforms over GPFS and (hopefully) Lustre parallel file systems. If everything goes well I will update you to discuss how to proceed, opening a PR or whatever. Just a note about parallel netcdf3 support. As of now NetCDF_jll only supports parallel netcdf4. Support for parallel netcdf3 is provided through the parallel-netcdf library which is not enabled in NetCDF_jll and not even available on Yggdrasil. While this is not a problem for this specific development (trying to access a netcdf3 file using parallel I/O will simply throw a "not supported" error), I think it would be useful for the package to support also parallel netcdf3. I have no previous experience with JLL packages and Yggdrasil, but if someone manages to add parallel-netcdf to Yggdrasil and enable support for parallel netcdf3 in NetCDF_jll I will be happy to test that too. |
JLL packages are not so hard to add with the wizard (see https://github.com/JuliaPackaging/BinaryBuilder.jl) And it is very likely that you are the best situated person to do this currently, probably the 100 to 1 favorite. And code that relies on manually installed system binaries will likely not be widely used. The julia ecosystem has moved very strongly towards versioned dependencies managed by Pkg.jl. So, I encourage you to give it a go :) But feel free to ping any JLL problems here for help/feedback. |
Just to chime in here (as a potential parallel NetCDF user!) - parallel NetCDF (or HDF5) is one case where system binaries are likely to be wanted. On an HPC cluster, we probably have to use the vendor-provided MPI to get the best performance (especially for inter-node communication), so the parallel NetCDF (and HDF5, which I mention as it'll be a dependency of parallel NetCDF for netcdf4 files, I assume) libraries will need to be linked to the system MPI, which the Julia-provided binaries will not be. At least, HPC users will want the option to do that, and I guess they are the main users of parallel NetCDF... For comparison, see the setup for parallel HDF.jl, which provides a utility function to link to the system binaries: |
Yes you're probably right generally, I was thinking of non-MPI use cases like Dagger.jl. This is pretty nice syntax in HDF5 if we only have the system binaries:
(but it would be very nice to have a JLL and "not even available on Yggdrasil" probably means there is no-one else to do it) |
There is some initial work in 70ef683 in the branch MPI. Using a custom netCDF library, potentially linking to an optimized MPI (and HDF5) library is possible using https://alexander-barth.github.io/NCDatasets.jl/stable/issues/#Using-a-custom-NetCDF-library |
Windows currently fails with (full logs), Linux and OS X do work ok:
Parallel support seem to be missing from the Windows NetCDF_jll
It seems that upsteam netcdf-c is not testing MPI on Windows (MSYS2 , mingw)
|
It is not clear if HDF5_jll has actually MPI enabled on Windows:
If somebody with an interest in Windows can have a look at this, this would be awesome :-). |
Is it new? I don't remember seeing it when I checked last week. A couple of notes:
|
Yes, this is new. I started to work on this only a couple of days ago. Thank you for your close look at these changes. I considered also to make the communicator (or parallel) a keyword argument. But as far as I know, this would mean that MPI becomes a (hard) dependency of NCDatasets as we cannot dispatch on keyword arguments. I would think that netCDF with MPI makes a very good use case of a weak dependencies. Currently MPI is the only way to have parallel access to netCDF files. For me MPI does not work so nice (or at all :-)) for interactive sessions. But maybe in future there will be other ways to do parallel access (threads, julia workers?) which all could be extensions onto which we could dispatch. In mpi4py all MPI functions are methods of the communicator. So having the MPI communicator as the first argument of |
Julia's package-manager-provided NetCDF_jll.jl now links to the package-manager-provided MPI. That means that if we use NetCDF in parallel, we have to use a package-manager-provided MPI to avoid weird errors due to multiple MPI libraries being linked. However, the default MPI is MPICH, and we have to use OpenMPI for the CI test to avoid horrible performance when oversubscribing (we do oversubscribe in the parallel test). It seems like too much of a faff to switch the MPI, and figure out how to call the package-manager-provided MPI, so instead just disable NetCDF (which we never use anyway). This issue may be fixed one day when NCDatasets.jl supports MPI, as it will (should?) provide functionality to link a system-provided NetCDF instead of the package-manager-provided one, but that functionality is still underdevelopment (JuliaGeo/NCDatasets.jl#122).
I don't know much about the subject but from looking at the PnetCDF description (https://parallel-netcdf.github.io/) it sounds like there are two backend options for parallel I/O: PnetCDF and parallel HDF5?
It sounds like it might be possible to build NetCDF with parallel I/O support.
Out of curiousity, is parallel I/O something that NCDatasets.jl can feasibly support?
X-Ref: CliMA/Oceananigans.jl#590
X-Ref: CliMA/ClimateMachine.jl#2007
The text was updated successfully, but these errors were encountered: