-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cumulative flow states #1819
cumulative flow states #1819
Conversation
Finally all tests are green again after the refactor. It was kind of a pain to make all the book-keeping work out (quite a bit changed in that regard). FindingsThese findings are based on running the HWS model as taken from the regression tests. Water balanceLet's start with the best result: for the HWS model the largest absolute water balance error is This means that per basin per day, less than a μL is not accounted for. Note that this balance error is independent of the solver tolerances, the water balance is inherently closed due to the problem formulation. The errors left are just (accumulated) floating point errors. ProfileThe refactor didn't introduce any glaring performance problems in the code: States and Jacobian sparsityThe state vector is still a component vector, with these components:
i.e. sorted by node type. This means that we can find convergence bottlenecks in a more fine-grained way than before. Considerations for the states:
The Jacobian sparsity looks like this: Which admittedly looks quite cool, but there are now many more states (going from 137 to 522) and more non-zeros in the Jacobian. To give an indication on how the Jacobian sparsity comes to be: say we have the following model section. The state for the manning flow depends on the forcing states of the connected basins, but also the flow states of all nodes connected to these basins. That is because the manning flow depends on the basin levels, which depend on the basin storages, which depend on the cumulative flows over all flow edges connected to the basins. PerformanceAs expected, the performance takes a hit. The runtime of the HWS model:
I had hoped I could reduce the number of states by clumping where |
I made a plot of the "Test Delwaq coupling" artifact "delwaq_map.nc" variable "ribasim_Continuity", which should be as close to 1 as possible, comparing this branch (new) to the current main (old). So that is very encouraging. (note the +1 on the y axis, so 0 is 1) import pandas as pd
import xarray as xr
ds_old = xr.open_dataset(r"main\delwaq_map.nc")
da_old = ds_old["ribasim_Continuity"]
ds_new = xr.open_dataset(r"integrated-flow-states\delwaq_map.nc")
da_new = ds_new["ribasim_Continuity"]
df_old = da_old.sel(ribasim_nNodes=3).to_dataframe()["ribasim_Continuity"].rename("old")
df_new = da_new.sel(ribasim_nNodes=3).to_dataframe()["ribasim_Continuity"].rename("new")
df = pd.concat([df_old, df_new], axis=1)
df.plot() |
Looks very promising! The robustness + simplicity with regards to the water balances makes me think this is well worth it.
You can reduce bandwidth with algorithms like: https://en.wikipedia.org/wiki/Cuthill%E2%80%93McKee_algorithm I'm not sure this is relevant: maybe the solver machinery already does things like this. Second, given the model sizes, a lot of data may fit wholly within CPU caches, so better bandwidth may have no meaningful effects on memory access patterns. |
@Huite I have dabbled in that before. As far as I'm aware, the goal of reducing the bandwidth is reducing fill-in in the Jacobian factorization. I looked a little bit in what EDIT: This is already part of the matrix factorization: https://dl.acm.org/doi/10.1145/992200.992206 |
I fixed the flaky tests, turns out there was a dictionary involved in setting up the edge data in the flow table which messed up the ordering. |
The coupler is switching to an approach where they don't need to reset the cumulatives to 0. Since this complicated the Ribasim code, let's get rid of it. The reason for the coupler to move away from resetting cumulatives is that this possibly broke with #1819. The dedicated BMI arrays like `vertical_flux_bmi` and `user_demand.realized_bmi` are removed. The `BMI.get_value_ptr` now just points to the cumulative state, or if it is not a state (drainage, precipitation) to `basin.cumulative_drainage` or `basin.cumulative_precipitation`. This is breaking because I also renamed these BMI variables to be more consistent with the current code, where we use the term cumulative internally. ``` user_demand.realized -> user_demand.inflow basin.drainage_integrated -> basin.cumulative_drainage basin.infiltration_integrated -> basin.cumulative_infiltration ```
With #1819 the state goes further away from 0 with time. That meanse we rely more on the relative tolerance than the aboslute tolerance. Reducing the default `reltol` by two orders of magnitude is enough to get only negligible leaky Terminals with occasional very brief leaks of 1e-8 m3/s. It is also enough to minimize the infiltration error to acceptable levels as can be seen in #1863 (comment). For HWS this slows down simulation from 9.1s to 12.3s, which isn't too bad for a two orders of magnitude reduction in relative tolerance. https://docs.sciml.ai/DiffEqDocs/stable/basics/faq/#What-does-tolerance-mean-and-how-much-error-should-I-expect
With #1819 the state goes further away from 0 with time. That means we rely more on the relative tolerance than the absolute tolerance, see the [SciML docs](https://docs.sciml.ai/DiffEqDocs/stable/basics/faq/#What-does-tolerance-mean-and-how-much-error-should-I-expect) on the topic. Hence I found that reducing the `abstol` had no effect, but reducing the default `reltol` by two orders of magnitude is enough to get only negligible leaky Terminals with occasional very brief leaks of 1e-8 m3/s. It is also enough to minimize the infiltration error to acceptable levels as can be seen in #1863 (comment). ~~For HWS this slows down simulation from 9.1s to 12.3s, which isn't too bad for a two orders of magnitude reduction in relative tolerance.~~ Rechecking HWS performance gives a speedup! 12.4s for 1e-8, 8.1s for 1e-7, 10.2s for 1e-6. So 1e-7 seems optimal. Not sure what went wrong last time, but this seems pretty consistent. This is with both absolute and relative set to the same number. Fixes #1851 Fixes #1863
Fixes #1808