Recursive interpolation #102

BrendanKKrueger · 2024-09-30T22:27:05Z

PR Summary

Recursive implementation of interpToReal to avoid the big, nasty formulas.

PR Checklist

Code is formatted. (You can use the format_spiner make target.)
Adds a test for any bugs fixed. Adds tests for new features.
If preparing for a new release, update the version in cmake.

…s were all wrong.

… type (but so far only T).

…t list.

BrendanKKrueger · 2024-10-01T21:06:53Z

@Yurlungur, I give about a 50% chance that this solution is controversial. But it's a way to clean up all the nasty equations that span 30ish lines. It also extends the ability to use an index in place of a coordinate so that it can be applied to any dimension instead of only the hard-coded ones (assuming one uses the new interface rather than the existing interface that I left in place so that we discuss all the details that go into this change).

BrendanKKrueger · 2024-10-01T21:25:44Z

It also extends the ability to use an index in place of a coordinate so that it can be applied to any dimension instead of only the hard-coded ones

It may be that this fulfills your desire for DataBox to be used as a data array in addition to being used as an interpolator. I realize I'm proposing more sweeping changes, but this is just thinking about possible future evolution. If we rename interpToReal to operator() then the user can do something like

auto value = db(1,7,3);

and that becomes simply looking up the value with those indices. If the user instead does

auto value = db(1.0, 7.0, 3.0);

then that's an interpolation. And they can mix and match as well, such as

auto value = db(1.0, 7, 3.0);

Of course, one could argue that this becomes unclear and hard to understand the difference between interpolating and indexing, but maybe not.

Yurlungur

Thanks for building this @BrendanKKrueger ! I think this is a nice solution to the hand-coded mess. That said, I do have some performance concerns, listed below. I don't know if I have the time to do performance testing, and I'm reluctant to ask you to do it... but I think that's the point we should consider. It's very important that DataBox is as performant as we can make it. Code cleanliness comes second vs that goal.

spiner/databox.hpp

Yurlungur · 2024-10-02T16:13:57Z

spiner/databox.hpp

+    // Note: grids are in reverse order relative to arguments
+    append_index_and_weights(iwlist + 1, grid - 1, other_args...);


inverting the order here is a little frightening... we might want some kind of (debug enabled) sanity check that the grid array is the same size as the arg list when we call this for the first time.

Unfortunately, something has to be inverted:

The order of the arguments is chosen to reflect the order of the interpolation recursion

The order of grids_ is reversed relative to the order of the arguments

The append_index_and_weights method could recurse in the opposite order as the interpolation, which would mean that the loop over grids wouldn't be backwards. But then either we're filling iwlist back to front or we're reading iwlist back to front during the interpolation recursion.

I was assuming that assert(canInterpToReal_(N)); would do the kind of sanity checking that you're looking for. If not, what sort of check(s) would you consider sufficient here?

Possibly related: When filling iwlist, I skip axes where the argument is already an index, because we don't need to store another copy of the same index and the weights are implicitly 1 and 0, so iwlist is redundant for those axes. However, because I declare iwlist as having length N, the memory is already allocated. After thinking about it, I think that this is just one extra thing to remember and we're better off just using iwlist consistently regardless of whether the argument is an index or a coordinate. This may be relevant if we change how iwlist is filled (e.g., if we fill it back-to-front). So I'm changing that behavior to just always use iwlist consistently.

spiner/databox.hpp

spiner/regular_grid_1d.hpp

Yurlungur · 2024-10-02T16:25:41Z

spiner/databox.hpp

+    const T v0 = interp_core(
+        [&c = callable, i = current.index](auto... ii) { return c(i, ii...); },
+        iwlist + 1, other_args...);
+    const T v1 = interp_core([&c = callable, i = current.index + 1](
+                                 auto... ii) { return c(i, ii...); },
+                             iwlist + 1, other_args...);


I am a bit concerned that this recursion where you hide more and more index data inside closures might have performance implications over the gross hand-implemented stuff. I think it would be good to make sure this recursive treatment doesn't have performance implications.

We might also be able to avoid the nested closures though. Perhaps the actual interpolation rule could be expressed as a fold expression or something similar?

The reason for my concern here is when I was first writing this code, i found that the order of operations in those complicated interpolation routines mattered a lot for performance. And IMO spiner is a low level thing, so it's important it's performant even if it's ugly.

I replaced the nested closures with rotating the arguments and replacing them by indices so that after a single full rotation you have all the indices you need as arguments, and then you essentially just call dataView_(args...). I think it's a lot cleaner without the nested closures (and they were really bothering me, too), but take a look and see what you think.

Yurlungur · 2024-10-02T16:26:30Z

spiner/databox.hpp

+    return interp_core(
+        [&c = callable, i = index](auto... ii) { return c(i, ii...); },
+        iwlist, // we didn't use iwlist so don't advance the pointer
+        other_args...);


Ah clever. Hard to reason about, but clever.

To me, "Hard to reason about, but clever," is very likely to be bad code. What part(s) do you find hard to reason about? I've cleaned up some things, but I'm still looking for ways to improve this code.

As I understand it, here's where you add in the ability to mix in integer indexes, right? It was hard to chase down what bit of code was doing that.

Okay, so a bit more clarifying documentation (in this case, probably code comments) seems called for. I'll add that. If you find anything unclear after I change that, we can continue this discussion to see how to improve readability and maintainability.

Sounds good. Also---I do think we should do some performance testing before you dive too deep into this. I'm still concerned about the nested closures (i.e., nested lambdas).

After playing with it some more, I'm even more convinced that test/benchmark.cpp isn't really reporting what we think it's reporting. I modified the test to do extra work (instead of just sampling one point in each GPU kernel, sample 3^3 points), and most of the times went down. So I think the per-kernel work is essentially negligible, and we're mostly timing random variations in something like transfer time to the GPU, kernel launch time, synchronization time of the reduction, or something along those lines.

Okay, I cranked it up to about 5,000 evaluations per GPU kernel. I'll put the actual data behind a fold (if GitHub will let me), but the punchline is: at this point the timings are more consistent, so we may be able to say something meaningful (but I'm still not sure, so take this with a healthy dose of skepticism). In this case, the recursive method is almost always faster than the hand-coded method. Occasionally the hand-coded method ends up faster, but only by a very small factor. The hand-coded method is usually slower by anywhere from a few percent to almost a factor of two.

runtime data

bkkrueger@cn136~> /vast/home/bkkrueger/spiner/worktrees/bkk_interpToReal/build/volta-x86/test/benchmark2.bin 1000 10000 100000 1000000 # ncoarse = 1000 # interpToReal 3D (hand-coded method) # nfine time/point (ns) L2 error 10000 1.308670e-07 0.000000e+00 100000 1.975500e-11 0.000000e+00 1000000 1.798200e-14 0.000000e+00 # interpolate 3D (recursive method) # nfine time/point (ns) L2 error 10000 7.284700e-08 0.000000e+00 100000 1.908000e-11 0.000000e+00 1000000 1.735100e-14 0.000000e+00 bkkrueger@cn136~> /vast/home/bkkrueger/spiner/worktrees/bkk_interpToReal/build/volta-x86/test/benchmark2.bin 1000 10000 100000 1000000 # ncoarse = 1000 # interpToReal 3D (hand-coded method) # nfine time/point (ns) L2 error 10000 1.316370e-07 0.000000e+00 100000 1.954000e-11 0.000000e+00 1000000 2.921400e-14 0.000000e+00 # interpolate 3D (recursive method) # nfine time/point (ns) L2 error 10000 1.002250e-07 0.000000e+00 100000 1.866700e-11 0.000000e+00 1000000 1.754400e-14 0.000000e+00 bkkrueger@cn136~> /vast/home/bkkrueger/spiner/worktrees/bkk_interpToReal/build/volta-x86/test/benchmark2.bin 1000 10000 100000 1000000 # ncoarse = 1000 # interpToReal 3D (hand-coded method) # nfine time/point (ns) L2 error 10000 1.229280e-07 0.000000e+00 100000 1.953800e-11 0.000000e+00 1000000 2.972000e-14 0.000000e+00 # interpolate 3D (recursive method) # nfine time/point (ns) L2 error 10000 1.002670e-07 0.000000e+00 100000 2.033900e-11 0.000000e+00 1000000 1.838300e-14 0.000000e+00 bkkrueger@cn136~> /vast/home/bkkrueger/spiner/worktrees/bkk_interpToReal/build/volta-x86/test/benchmark2.bin 1000 10000 100000 1000000 # ncoarse = 1000 # interpToReal 3D (hand-coded method) # nfine time/point (ns) L2 error 10000 1.263160e-07 0.000000e+00 100000 1.975700e-11 0.000000e+00 1000000 1.807400e-14 0.000000e+00 # interpolate 3D (recursive method) # nfine time/point (ns) L2 error 10000 1.003250e-07 0.000000e+00 100000 1.905700e-11 0.000000e+00 1000000 1.900200e-14 0.000000e+00 bkkrueger@cn136~> /vast/home/bkkrueger/spiner/worktrees/bkk_interpToReal/build/volta-x86/test/benchmark2.bin 1000 10000 100000 1000000 # ncoarse = 1000 # interpToReal 3D (hand-coded method) # nfine time/point (ns) L2 error 10000 1.279110e-07 0.000000e+00 100000 1.978200e-11 0.000000e+00 1000000 1.831300e-14 0.000000e+00 # interpolate 3D (recursive method) # nfine time/point (ns) L2 error 10000 1.046090e-07 0.000000e+00 100000 1.957700e-11 0.000000e+00 1000000 1.833200e-14 0.000000e+00 bkkrueger@cn136~> /vast/home/bkkrueger/spiner/worktrees/bkk_interpToReal/build/volta-x86/test/benchmark2.bin 1000 10000 100000 1000000 # ncoarse = 1000 # interpToReal 3D (hand-coded method) # nfine time/point (ns) L2 error 10000 1.363300e-07 0.000000e+00 100000 1.942000e-11 0.000000e+00 1000000 3.694600e-14 0.000000e+00 # interpolate 3D (recursive method) # nfine time/point (ns) L2 error 10000 9.183900e-08 0.000000e+00 100000 1.946100e-11 0.000000e+00 1000000 1.773400e-14 0.000000e+00 bkkrueger@cn136~> /vast/home/bkkrueger/spiner/worktrees/bkk_interpToReal/build/volta-x86/test/benchmark2.bin 1000 10000 100000 1000000 # ncoarse = 1000 # interpToReal 3D (hand-coded method) # nfine time/point (ns) L2 error 10000 1.227620e-07 0.000000e+00 100000 1.960800e-11 0.000000e+00 1000000 1.834800e-14 0.000000e+00 # interpolate 3D (recursive method) # nfine time/point (ns) L2 error 10000 9.942100e-08 0.000000e+00 100000 1.910700e-11 0.000000e+00 1000000 1.716100e-14 0.000000e+00

I think I'm satisfied with this. Thanks for chasing this down. When we wrote the paper, @dholladay00 and I also used some instrumentation... @dholladay00 do you know where that code went?

I don't recall this, what do you mean by instrumentation code?

I thought we had a benchmarking tool... as well as nvprov. I know we had to scale it way up from the test that's in the repo. But maybe it's just that test scaled way up...

Yurlungur · 2024-10-02T16:29:19Z

spiner/databox.hpp

+  // (2) I think it would be worth testing if DataBox works for other
+  //     (non-real) types, such as std::complex.


I agree and I think there's no reason it shouldn't work.

My suspicion is that it might work, but that it's very likely for there to be improvements we could make. For example, the weights are declared to be type T, but in practice they're always real. Similarly, I think DataBox may compute the x values as type T when they're also real. It's possible that may push the design towards something like adding a template for the independent variable's type and have it default to the dependent variable's type. Existing code would hopefully continue to work as-is, but people wanting to leverage newer features (like improved support for std::complex) would be able to say "the dependent variable is std::complex<double> but the independent variables are double.

spiner/databox.hpp

…a typo fix that already existed).

…ture removal of "interpToReal".

BrendanKKrueger changed the title ~~Bkk interp to real~~ interpToReal Sep 30, 2024

BrendanKKrueger added 3 commits October 1, 2024 11:19

Fix a minor type error and add a const version.

a095651

First draft of recursive implementation.

f51c410

I was querying the grids in the wrong order, so my indices and weight…

8ac909c

…s were all wrong.

BrendanKKrueger force-pushed the bkk_interpToReal branch from cebc8e6 to 8ac909c Compare October 1, 2024 17:19

format

1cf992c

BrendanKKrueger marked this pull request as ready for review October 1, 2024 17:24

BrendanKKrueger marked this pull request as draft October 1, 2024 17:26

BrendanKKrueger changed the title ~~interpToReal~~ Recursive interpolation Oct 1, 2024

BrendanKKrueger added 11 commits October 1, 2024 13:47

Rename for now.

45156e1

Added a customization point where the argument _could_ be a different…

ff9a916

… type (but so far only T).

More generalizing.

5b7e13b

Rearrange.

74cdfbc

Change how the arguments are iterated over to populate the indexweigh…

4d5ceef

…t list.

Add int

f093ae4

number template is now redundant

b21e940

Use interpolate_alt for remaining interpToReal.

c885dcf

A little cleanup

2acc1fe

Cleanup

fc2462a

format

f881e5b

BrendanKKrueger marked this pull request as ready for review October 1, 2024 21:05

Yurlungur reviewed Oct 2, 2024

View reviewed changes

BrendanKKrueger added 6 commits October 2, 2024 16:57

consistency

94a0014

Better handling of passing the indices to the dataView_ lookup (plus …

939e602

…a typo fix that already existed).

Replace separate index and weights_t with combined index_and_weights_t.

52192b6

a bit of clarifying comments

4d997b5

Streamline iwlist iteration.

1f53801

Minor tweaks

55197b2

BrendanKKrueger added 2 commits October 23, 2024 11:19

Delete TODO comment.

99cf011

Rename "interpolate" to "interpToScalar", and add a note about the fu…

7b22602

…ture removal of "interpToReal".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursive interpolation #102

Recursive interpolation #102

BrendanKKrueger commented Sep 30, 2024 •

edited

Loading

BrendanKKrueger commented Oct 1, 2024

BrendanKKrueger commented Oct 1, 2024

Yurlungur left a comment

Yurlungur Oct 2, 2024

BrendanKKrueger Oct 8, 2024

BrendanKKrueger Oct 8, 2024

Yurlungur Oct 2, 2024

BrendanKKrueger Oct 8, 2024

Yurlungur Oct 2, 2024

BrendanKKrueger Oct 3, 2024

Yurlungur Oct 4, 2024

BrendanKKrueger Oct 4, 2024

Yurlungur Oct 4, 2024

BrendanKKrueger Oct 8, 2024 •

edited

Loading

BrendanKKrueger Oct 22, 2024

Yurlungur Oct 24, 2024

dholladay00 Oct 24, 2024

Yurlungur Oct 25, 2024

Yurlungur Oct 2, 2024

BrendanKKrueger Oct 2, 2024

		// Note: grids are in reverse order relative to arguments
		append_index_and_weights(iwlist + 1, grid - 1, other_args...);

		// (2) I think it would be worth testing if DataBox works for other
		// (non-real) types, such as std::complex.

Recursive interpolation #102

Are you sure you want to change the base?

Recursive interpolation #102

Conversation

BrendanKKrueger commented Sep 30, 2024 • edited Loading

PR Summary

PR Checklist

BrendanKKrueger commented Oct 1, 2024

BrendanKKrueger commented Oct 1, 2024

Yurlungur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BrendanKKrueger Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BrendanKKrueger commented Sep 30, 2024 •

edited

Loading

BrendanKKrueger Oct 8, 2024 •

edited

Loading