You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, MBCn is not fast enough to be realistically used on our servers for
an ensemble of ~96 simulations
three variables pr, tasmin, tasmax
~40000 spatial points over 30 years grouped as dayofyear-31.
This may be performance issues that could be addressed with better algorithm, or slowdowns because dask is overwhelmed, in which this is more related to how we organize computations and tasks.
Potential Solution
This is a list of ideas stemming from the xclim hackathon on MBCn:
Better algorithms
Faster interp method"linear" interpolation can be 5x faster than "nearest" in some test cases, it would be good to confirm we can also get improvements in the case of interest
Use implementations like xclim.core.utils.nan_calc_percentiles? Add numba in the mix
Better organization, less redundancy in operations Reduce the need to recompute many times the same information
Simulation, Moving window, horizons Currently, Npdf_transform only accepts ref, hist, sim which are 30 year datasets with the same dimensions. If we want to compute another horizon, say sim2, we would need to re-run everything. One workaround would be to let the Npdf_transform know that sim has a dimension time which is a say 30 year-span, and another dimension say horizon which are different futur periods of 30 years, e.g. horizons = ["1981-2010", "2011-2040", ...]. This kind of dataset can be obtained with construct_moving_yearly_window
Separate training and adjustment in Npdf_transform Another way to approach this is to make the Npdf_transform more modular, and separate the training and adjusting. We would need to keep in note the rotations that are used and the adjustment factors in each rotation
Pre-compute adjustment factors on a given rank to perform interpolation only onceThis could go even farther. Currently, interpolation over quantiles and adjustment factor, {q, af_q}, yields a scipy function f(q) which is applied to the ranks of the simulation to perform the QDM adjustment, f(r). But, we re-do the interpolation to find f each time we want to adjust a given simulation. In reality, this information about interpolation could be obtained once, then re-used every time. Since we want to correct simulation with the same number of years (and more exactly, we want the same number of time points as the reference dataset, the number of ranks on which f must be applied. We could pre-determine the values of f(r) in this case. UPDATE: Unfortunately, because of how ties in ranks are treated (if say the 3 smallest values are equal, their ranks is 2,2,2, and not 1,2,3), this idea does not work.
Mostly done, but not as clean as possible. I would like some validation before. I would say this achieves what we wanted. There is a fast_npdf_train and fast_npdf_adjust in npdf_np_modular_interp. The training gives adjustment factors that can needs to be reused in the adjusting part. Also, the adjusting part can receive a simulation dataset with movingwin to treat multiple dimensions at the same time. This could get its own clean class if we judge it's worth it? A lot of heavy-lifting is done in pure numpy, so it's messy. Ideally, it could be nice to emulate the performance with a more xclim-y approach, so maybe we don't want to promote the ugly numpy hacks to real xclim implementations.
Rotations
Continue to explore the choice of optimal rotations and the convergence of the Npdf-transform
Use PCA instead?
Understanding better what's going on
Study dask graphs with and without Npdf_transform (in the case without, just replace with a dummy_Npdf_transform that does nothing)
This is less on the side of xclim, but:
Use month instead of dayofyear can be a "reduce expectations" solution.
Additional context
No response
Contribution
I would be willing/able to open a Pull Request to contribute this feature.
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
Addressing a Problem?
Currently, MBCn is not fast enough to be realistically used on our servers for
pr, tasmin, tasmax
This may be performance issues that could be addressed with better algorithm, or slowdowns because dask is overwhelmed, in which this is more related to how we organize computations and tasks.
Potential Solution
This is a list of ideas stemming from the xclim hackathon on MBCn:
Better algorithms
"linear"
interpolation can be 5x faster than"nearest"
in some test cases, it would be good to confirm we can also get improvements in the case of interestxclim.core.utils.nan_calc_percentiles
? Add numba in the mixBetter organization, less redundancy in operations Reduce the need to recompute many times the same information
Npdf_transform
only acceptsref, hist, sim
which are 30 year datasets with the same dimensions. If we want to compute another horizon, saysim2
, we would need to re-run everything. One workaround would be to let theNpdf_transform
know that sim has a dimensiontime
which is a say 30 year-span, and another dimension sayhorizon
which are different futur periods of 30 years, e.g.horizons = ["1981-2010", "2011-2040", ...]
. This kind of dataset can be obtained withconstruct_moving_yearly_window
Npdf_transform
more modular, and separate the training and adjusting. We would need to keep in note the rotations that are used and the adjustment factors in each rotationThis could go even farther. Currently, interpolation over quantiles and adjustment factor,UPDATE: Unfortunately, because of how ties in ranks are treated (if say the 3 smallest values are equal, their ranks is 2,2,2, and not 1,2,3), this idea does not work.{q, af_q}
, yields a scipy functionf(q)
which is applied to the ranks of the simulation to perform the QDM adjustment,f(r)
. But, we re-do the interpolation to findf
each time we want to adjust a given simulation. In reality, this information about interpolation could be obtained once, then re-used every time. Since we want to correct simulation with the same number of years (and more exactly, we want the same number of time points as the reference dataset, the number of ranks on whichf
must be applied. We could pre-determine the values off(r)
in this case.Mostly done, but not as clean as possible. I would like some validation before. I would say this achieves what we wanted. There is a
fast_npdf_train
andfast_npdf_adjust
innpdf_np_modular_interp
. The training gives adjustment factors that can needs to be reused in the adjusting part. Also, the adjusting part can receive a simulation dataset withmovingwin
to treat multiple dimensions at the same time. This could get its own clean class if we judge it's worth it? A lot of heavy-lifting is done in pure numpy, so it's messy. Ideally, it could be nice to emulate the performance with a more xclim-y approach, so maybe we don't want to promote the ugly numpy hacks to real xclim implementations.Rotations
Understanding better what's going on
This is less on the side of xclim, but:
Use
month
instead ofdayofyear
can be a "reduce expectations" solution.Additional context
No response
Contribution
Code of Conduct
The text was updated successfully, but these errors were encountered: