Suggestions for performance improvement (Best practices) when using Xarray with Dask #7632
ricardobarroslourenco
started this conversation in
Office Hours
Replies: 2 comments 6 replies
-
What is the We could have a longer discussion at the office hours on Friday |
Beta Was this translation helpful? Give feedback.
5 replies
-
@ricardobarroslourenco Regarding the HDF warnings, this might be related #7549 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
On a workflow for Remote Sensing (please refer to the gist) I am currently able to load and do the processing of my data (originally ENVI raster files transformed into netcdf with
rioxarray
), and the results are quite good (initially, I used to process in R, facing some memory limitations and issues with data structures, and Dask+Xarray solved this end of the problem).However, in this mentioned notebook, I am facing performance issues, especially when getting to the end of the workflow. Saving the Dataset is taking hours; however, the CPU utilization is somehow low (average in the time-series of 30-40%), and the write bandwidth is almost unused, around 2Mbps in the Dask monitoring (I am running in an Apptainer container (derived from this Docker image ) in an HPC node with 128GB of RAM and saving it in a
scratch
partition which has lustre filesystem (and Gigabit speeds).Any suggestions on how can I start improving my workflow?
EDIT 1:
A compute time per task screenshot. How can I find the functions that trigged such tasks? I was wondering if I have a margin to improve things here (basically on the four large ones, which do stacking, and mapping on store)
EDIT 2: The source data can be found here.
EDIT 3: It seems that some issues are happening with the HDF5 library... =/
EDIT 4: Adding new version of the notebook, with better usage of
apply_ufunc
, but the process of exporting to netCDF is still really slow...Beta Was this translation helpful? Give feedback.
All reactions