Pairwise summation floats #4303

robertbindar · 2023-08-30T13:37:56Z

This adds a pairwise summation function for floating point types.

TYPE: FEATURE
DESC: Pairwise summation floats

shortcut-integration · 2023-08-30T13:37:59Z

This pull request has been linked to Shortcut Story #33201: Implement pairwise summation for floats.

eric-hughes-tiledb

It would be better if this function were extensible to allow integral types. This is an interface issue, not a request for such code in the current PR. One goal would be to remove any explicit type dependency (such as log2f vs. log2l) from the unit tests and into a template of some sort.

Overflow would need to be dealt with. You'd also need a new equality operation (not operator==) for testing; it would admit the difference between exactly equal and approximately equal.

eric-hughes-tiledb · 2023-08-30T15:14:14Z

tiledb/common/pairwise_sum.h

+namespace tiledb::common {
+
+template <class T, std::size_t Base = 128>
+requires std::floating_point<T> T pairwise_sum(const std::span<T>& x) {


Needs documentation, at minimum.

If I'm reading this correctly, this function is written in such a way as to allow the optimizer to generate vector instructions. If that's the purpose, on top of the two standard uses (1. main code and 2. unit tests) it has two additional uses: 3. Development performance testing and 4. Pre-deployment performance measurement.

Item 3: Development performance testing. This is the testing to validate that the particular algorithm actually generates optimal performance. This must always be measured, and always measured with respect to alternatives. The point about alternatives is that the interface between alternatives must be identical. This is hard to do with free functions, and is easier to do with classes. It's possible for a class to declare static operator() so that it looks just like a function.

Item 4: The size of cache lines varies between processors, and it's a parameter that a user can look up. It would be better if the Base argument were CacheLineSize or something. The maximum length of the loop can be computed as a constexpr. Doing this level of optimization will eventually need top-level configuration support, so it would be a good idea to declare a constexpr at the top of the file with the default used.

The name should first say what it does. In this case, because there need to be multiple versions for optimization, it should secondarily distinguish itself by how it works. I would call it IteratedSum_Recursive for this version in the PR at the time of this comment.

Documentation, agree, I'll do that.
I wasn't aware of Item 3 and Item 4. Let's use this comment as basis for a meeting to draw the real requirements of this work.
Being cache line aware for the base case is definitely interesting and worth exploring. The numpy team for instance chose to go the manual loop unrolling direction as the first optimization and saw somewhere they claimed a 20% benefit 🤷 Very expensive to go down any direction for such simple function without a clear direction in mind and a benchmark on the table to test variations.

eric-hughes-tiledb · 2023-08-30T15:25:16Z

tiledb/common/pairwise_sum.h

+  }
+  size_t mid = std::floor(x.size() / 2);
+  return pairwise_sum<T, Base>(x.first(mid)) +
+         pairwise_sum<T, Base>(x.subspan(mid));


Recursion is going to slow down the operation, as is running a bunch of loops of variable length.

I'd do the following as a first pass implementation. N is the line length.

Validate that N is power of two. This is to allow the use of std::align in the next step.

Split the range into three parts, based on alignment. The first part is of length in the range [0,N). The second part is of length kN for some integer k > 0 and is aligned on memory boundary aligned with N. The third part is also of length in the range [0,N).

Loop over the parts in memory order, in three stages.

Use a fixed-length loop for the second stage. Fixed-length vector operations are a bit faster that variable-length ones.

All sound good, I wouldn't go down this rabbit hole though without a benchmark in place to test the benefits of these optimizations.

robertbindar · 2023-09-01T11:54:41Z

It would be better if this function were extensible to allow integral types. This is an interface issue, not a request for such code in the current PR. One goal would be to remove any explicit type dependency (such as log2f vs. log2l) from the unit tests and into a template of some sort.

@eric-hughes-tiledb I don't really understand why I would extend this to integral types, I didn't have much of a requirements list for this function, but I definitely saw that it's a floating point summation function, please let me know if I'm missing something.

Regarding type dependency, log2f and log2l I think can go away with c++23 when log2 gets added.

Overflow would need to be dealt with. You'd also need a new equality operation (not operator==) for testing; it would admit the difference between exactly equal and approximately equal.

Do we have plans to turn this into a floating point operations library of some sorts? This function was pitched to me like "1-2 hours of work", thus I added a simple pairwise summation implementation + a test.

robertbindar · 2023-09-05T10:01:39Z

@eric-hughes-tiledb updated the code based on what we last discussed.
I have some comments/questions here:

I adjusted generated vectors from 2^24 as we discussed to 2^22 as the test was quite slow.
It's the first time I use catch2 benchmark, so I just put the small snippets in the unit test. Where are these benchmarks supposed to be placed? Do we want them to be run in the CI given they are quite slow?
static operator() overloads need to wait till c++23

pairwise summation floats

dfda50b

robertbindar requested review from ihnorton and eric-hughes-tiledb August 30, 2023 13:37

fix test

20a4958

eric-hughes-tiledb reviewed Aug 30, 2023

View reviewed changes

address eric's review

c244559

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pairwise summation floats #4303

Pairwise summation floats #4303

robertbindar commented Aug 30, 2023 •

edited

Loading

shortcut-integration bot commented Aug 30, 2023

eric-hughes-tiledb left a comment

eric-hughes-tiledb Aug 30, 2023

robertbindar Sep 1, 2023

eric-hughes-tiledb Aug 30, 2023

robertbindar Sep 1, 2023

robertbindar commented Sep 1, 2023

robertbindar commented Sep 5, 2023

Pairwise summation floats #4303

Are you sure you want to change the base?

Pairwise summation floats #4303

Conversation

robertbindar commented Aug 30, 2023 • edited Loading

shortcut-integration bot commented Aug 30, 2023

eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

eric-hughes-tiledb Aug 30, 2023

Choose a reason for hiding this comment

robertbindar Sep 1, 2023

Choose a reason for hiding this comment

eric-hughes-tiledb Aug 30, 2023

Choose a reason for hiding this comment

robertbindar Sep 1, 2023

Choose a reason for hiding this comment

robertbindar commented Sep 1, 2023

robertbindar commented Sep 5, 2023

robertbindar commented Aug 30, 2023 •

edited

Loading