-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numba for speed up #27
Conversation
On second thought, reproducible runs are important, so I'll look at ways to make sure the runs are deterministic from a seed. |
okay, well turns out that was pretty easy, just have to make sure the rng seed is set in a jitted function, and then only and always call np.random from jitted functions. |
import time | ||
import numba | ||
|
||
# utilities used in various places in the model and docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems very similar in intent to the shared_tools.py
file we have. Any reason they can't be combined? (I have no attachment to either name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this would eliminate the shared_tools file. I have this done in another commit, but haven't pushed it. I'll do some of the jitting again, on top of Eric's changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that numba doesn't support jitting classes in the way that we would like to use them (for example, they don't import inheritance) and so the way you've done it seems like the simplest way forward for us.
Idea is quite nice, that is a pretty significant speed up indeed. |
Copy of what @amoodie did in DeltaRCM#27
This makes them available outside the object, which is important in some cases, and the class just gets them from here. feat: make shared_tools into a module, not a class. This is in preparation for jitting, since we want them to be standalone functions. feat: break out step that updates water parcel positions These functions now become helper utility functions in shared_tools, and are wrapped by update_Q, which handles them all, and passes them what they need. refactor: break out sed functions into smaller functions This modularizes the code and makes it easier to compile. feat: jit the functions apply jit to functions, and fix some type problems that came along with refactoring. refactor: move next step calculation to jitted functions The next slowest operation was getting the indexes of the next cells, so I moved that to a jitted function too. refactor: jit the calculation of weight array for the water fix: move random number generation into numba Copy of what @amoodie did in DeltaRCM#27 fix: some broken tests to test the shared tools portion. fix: add numba to requirements fix: move all inlet picking into the random_pick function There's essentially no difference between the random_pick function and the random_pick_inlet function other than there is no preference given for the inlet cells, so you can pick whichever one, so I just adjust the mechanism to use the existing random pick machinery. fix: water and sediment routing bugs that caused inconsistent behavior There were two issues with the refactoring. One was that water routing developed a flat flow field because I wasn't adding each passing water parcel to the Q field. The other was related to sediment parcel routing, where I had transcribed the i and j of stepping incorrectly in the jitted functions.
Since we merged #36, I'm going to go ahead and close this. Thanks for the great idea and all the good investigation and development that this PR motivated. |
This makes them available outside the object, which is important in some cases, and the class just gets them from here. feat: make shared_tools into a module, not a class. This is in preparation for jitting, since we want them to be standalone functions. feat: break out step that updates water parcel positions These functions now become helper utility functions in shared_tools, and are wrapped by update_Q, which handles them all, and passes them what they need. refactor: break out sed functions into smaller functions This modularizes the code and makes it easier to compile. feat: jit the functions apply jit to functions, and fix some type problems that came along with refactoring. refactor: move next step calculation to jitted functions The next slowest operation was getting the indexes of the next cells, so I moved that to a jitted function too. refactor: jit the calculation of weight array for the water fix: move random number generation into numba Copy of what @amoodie did in DeltaRCM#27 fix: some broken tests to test the shared tools portion. fix: add numba to requirements fix: move all inlet picking into the random_pick function There's essentially no difference between the random_pick function and the random_pick_inlet function other than there is no preference given for the inlet cells, so you can pick whichever one, so I just adjust the mechanism to use the existing random pick machinery. fix: water and sediment routing bugs that caused inconsistent behavior There were two issues with the refactoring. One was that water routing developed a flat flow field because I wasn't adding each passing water parcel to the Q field. The other was related to sediment parcel routing, where I had transcribed the i and j of stepping incorrectly in the jitted functions.
Highlights
I was messing about and tried to see what kind of speed-up we could get with just using Numba.
I profiled the following loop on
develop
(i.e., thetests/time_water.py
script):for an elapsed time = 1.2779 min.
With some really simple changes to the water routing routines (only changed one of three
list(map(lambda(...
calls) and change of random picking in a few places, I get an elapsed time 0.5052 min.This is really promising for #1, I think.
Question
How much do we care about exact consistency, down to the random number sequences?
These changes use the Numba random number generator (rather than the numpy rng), and so they produce different results. For example, the
tests/test_output_consistency.py
test fails with the Numba implementation. The expected bed elevation at a small slice after oneupdate()
:what is returned with Numba:
what is expected:
This ultimately would be an issue for any C implementations too. My feeling is that it's fine to break exact consistency between versions, as long as we note it in a changelog. I think breaking behavior changes of the model are much more important, and require a higher-level need to allow.
Anyway, I don't want to merge this now, just looking for some disucssion. @elbeejay @ericbarefoot @ammilten
Also, sorry the commit history will shorten when #26 is merged.