Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dask parallelisation to writing chunks to JASMIN OS. #20

Open
oj-tooth opened this issue Nov 7, 2024 · 0 comments
Open

Add dask parallelisation to writing chunks to JASMIN OS. #20

oj-tooth opened this issue Nov 7, 2024 · 0 comments

Comments

@oj-tooth
Copy link
Collaborator

oj-tooth commented Nov 7, 2024

Problem:
The current send command implements a dask cluster (when specified) to parallelise the writing of each variable stored in a given dataset to individual Zarr stores on the JASMIN object storage. The writing of the chunks belonging to a given variable is then completed using .to_zarr() in serial and is hence dependent on the number of chunks.

Proposal:
Introduce a new send_with_dask function which uses a combination of workers and threads in a dask cluster to write the chunks belonging to each variable in parallel. A loop is implemented (serial) over the variables contained in the given dataset - this is the behaviour of .to_zarr() when using an xarray Dataset.

Challenge:
Currently, the approach above is easy to implement with a LocalCluster by simply adding two new arguments to the existing send function, however, configuring the cluster will need to be different when using JASMIN (which uses dask-gateway or a Archer2 (where a SLURMCluster is available).

@oj-tooth oj-tooth changed the title Implementing dask parallelism to writing chunks to JASMIN OS. Add dask parallelism to writing chunks to JASMIN OS. Nov 7, 2024
@oj-tooth oj-tooth changed the title Add dask parallelism to writing chunks to JASMIN OS. Add dask parallelisation to writing chunks to JASMIN OS. Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant