SplineC2C/R2R rotation with BLAS #4710

camelto2 · 2023-08-24T16:16:19Z

Please review the developer documentation
on the wiki of this project that contains help and requirements.

Proposed changes

change naive matrix multiplication in SplineR2R::applyRotation to a blas call. Speeds the applyRotation call up by quite a bit.

note that these changes should be covered by the unit test, just changing how the matrix multiplication is done

Update: also added the corresponding change needed for SplineC2C.

What type(s) of changes does this code introduce?

Delete the items that do not apply

New feature
Other (please describe):

Does this introduce a breaking change?

No

What systems has this change been tested on?

macOS

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.

Yes. This PR is up to date with current the current state of 'develop'
Yes. Code added or changed in the PR has been clang-formatted
No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
No. Documentation has been added (if appropriate)

actually pass the test

…h_blas

quantumsteve · 2023-08-24T16:31:26Z

Let's be careful changes in derived classes aren't lost as we move to templated classes in #4684

markdewing · 2023-08-24T16:52:01Z

There may be some subtleties related to precision. If a single precision spline is used, this change reduces the rotation matrix to single precision before the multiplication.

ye-luo · 2023-08-24T17:04:02Z

src/QMCWaveFunctions/BsplineFactory/SplineR2R.cpp

-    }
-  }
+       rot_mat_padded[i * Nsplines + j] = rot_mat.data()[i * OrbitalSetSize + j];
+  BLAS::gemm('N', 'N', Nsplines, BasisSetSize, Nsplines, ST(1.0), rot_mat_padded.data(), Nsplines, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);


I feel we don't need rot_mat_padded.

BLAS::gemm('N', 'N', OrbitalSetSize, BasisSetSize, OrbitalSetSize, ST(1.0), rot_mat.data(), OrbitalSetSize, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);

There's still a precision issue, unless BLAS::gemm can handle matrices with different precision types.

I was having trouble getting the BLAS call to work without padding. I am trying to figure it out now

I agree there is still a precision issue. I wasn't sure the best way to handle that either. the spline coeffs are ST type and the rotation is RealType. The BLAS::gemm doesn't like mixed precision types, so I had to do some conversion somewhere. If I changed coef_copy_ back to RealType, and the rotation is RealType, the problem then lies in spl_coefs. It is ST*, and I can't just cast it to RealType* in the case of ST=float.
Making another temporary copy would be costly memory wise

Are you intending to work with single or double precision splines? I ran into some problems with rotation and single precision splines early on, and since then have only used double precision splines.
The code might need to be split differently based spline precision - call gemm for double precision, and an explicit loop for single precision. That would at least speed up the double precision case.

The precision issue also relates to the history list vs. global rotation method. With the history list, the rotations get accumulated in the existing coefficient storage. Single precision coefficients fail at this accumulation of small changes. With the global rotation method this might not be an issue, since this accumulation of rotations happens in the rotation matrix. It might work to compute the rotation matrix in double precision, then convert to single precision, and then do the matrix multiplication. But that would need to be tested, since the matrix multiplication step might also be sensitive to precision.

I would prefer double only, but some of the unit tests use float so I was trying to get it to work in general

I would be okay with different code paths for float vs double (use the loop for float and gemm for double). Using "if constexpr" to test the type should make it straightforward.

@markdewing, added a split depending on float vs. double. this should avoid the loss in precision.

@ye-luo still trying to figure out how to avoid adding the padded rotation matrix. I thought it should be as simple as fiddling with the LDx parameters, but having trouble getting it to work

A code path split by precision should also avoid the need for the extra padded rotation matrix.

@ye-luo I think I have the blas call correct now to avoid needing to construct the padded rotation matrix. It is working correctly on some test matrices, and unit tests pass

I feel we don't need rot_mat_padded.

BLAS::gemm('N', 'N', OrbitalSetSize, BasisSetSize, OrbitalSetSize, ST(1.0), rot_mat.data(), OrbitalSetSize, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);

I should have read this in detail sooner. I didn't realize you had already told me how to it. Took me a bit, but I stumbled onto doing correctly

camelto2 · 2023-08-24T17:14:58Z

Let's be careful changes in derived classes aren't lost as we move to templated classes in #4684

Thanks for pointing this out. I had two PRs recently (#4706, #4701) that touched RotatedSPOs that doesn't look like it has been captured in #4684. I can make a PR into @williamfgc's branch with the corresponding changes to bring RotatedSPOs up to date. And if this gets merged once I sort out the review comments can do the same for this if that works

williamfgc · 2023-08-24T18:23:10Z

@camelto2 @quantumsteve thanks for bringing this up. Your changes can be accommodated into the refactoring effort after merged into develop, we just won't touch it for now. We rebase develop constantly.

…h_blas

jptowns · 2023-08-24T20:56:10Z

src/QMCWaveFunctions/BsplineFactory/SplineR2R.cpp

+               (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);
+  }
+  else
+  {


This is tough: apparently everything I did had hidden type mis-matches because ST is not guaranteed to be equivalent to ValueType. Seems to me the underlying problem is the type system of the splines is kinda divorced from the rest of the code. If it's true that single precision may be problematic for rotations, then why not ditch single precision splines and keep the production code simple? Or is there some way to enforce double precision splines if rotation is added? Just trying to think of ways to avoid adding complexity to production code if not absolutely necessary.

My guess is single precision splines are still of interest for size reasons.

camelto2 · 2023-08-24T22:54:41Z

Figured out how to do C2C correctly using the BLAS call. added that here as well

src/QMCWaveFunctions/BsplineFactory/SplineR2R.cpp

…h_blas

markdewing · 2023-08-28T18:26:06Z

Test this please

src/QMCWaveFunctions/BsplineFactory/SplineC2C.cpp

ye-luo · 2023-08-28T19:12:47Z

src/QMCWaveFunctions/BsplineFactory/SplineC2C.cpp

+  else
+  {
+    // if ST is float, RealType is double and ValueType is std::complex<double> for C2C
+    // Just use naive matrix multiplication in order to avoid losing precision on rotation matrix


I kind of doubt precision loss matters.

@markdewing was mentioning that he had run into problems with precision when using the float splines and wanted the split between double and float. Maybe he can chime in with more details

There were cases where splines with float precision had trouble with orbital rotation optimization, and those problems went away when I switched to double precision splines. I haven't investigated further.
The important point for these code paths is that the type of the spline (ST) and the main code type (RealType) have to match for BLAS calls to work without additional copies. For full precision builds, that means double precision splines. For mixed precision builds, I would guess the single precision splines would follow the BLAS code path.

How it stands currently is that coefs_copy_ and spl_coefs are ST, whereas rot_mat is passed in as ValueType. There isn't a mixed precision blas call, so in order to use blas we would have to make a copy of rot_mat of ST type. That would allow BLAS to be used always, but you are losing precision when ST is float since it is passed in as ValueType (double or std::complex). I'm not entirely sure how much that precision matters, since the output spline coeffs are ST anyway. We are doing an extra copy, but it is only Norb^2 so shouldn't be a problem.

How it is currently written avoids the extra copy of rot_mat, but only benefits from BLAS when ST == RealType.

How common are runs where ST != RealType?

The NiO performance tests have the splines set with precision="single", so I think it's still an important case in general. I'm not sure how important it will be for cases involving rotation. Though my guess is someone will want to try it due to memory pressure on the size of the spline coefficients.

Let us defer the investigation of precision. It is important but we need the feature working first.

src/QMCWaveFunctions/BsplineFactory/SplineR2R.cpp

src/QMCWaveFunctions/BsplineFactory/SplineC2C.h

ye-luo · 2023-08-28T21:10:32Z

Test this please

prckent · 2023-08-28T21:16:00Z

Agreeing with Ye here - let us focus on getting the feature working.

camelto2 added 3 commits August 22, 2023 16:57

initial attempt at BLAS implementation, commented out since it doesn't

332ad01

actually pass the test

got blas call working. row major vs. column major was the problem

047b901

Merge remote-tracking branch 'upstream/develop' into spl_rotation_wit…

3474373

…h_blas

camelto2 requested review from jptowns and markdewing August 24, 2023 16:16

ye-luo reviewed Aug 24, 2023

View reviewed changes

camelto2 added 3 commits August 24, 2023 13:26

make split in implementation depending on float vs. double

53f111e

Merge remote-tracking branch 'upstream/develop' into spl_rotation_wit…

29dadab

…h_blas

removed padded rot_mat

c86e331

jptowns reviewed Aug 24, 2023

View reviewed changes

markdewing previously approved these changes Aug 24, 2023

View reviewed changes

camelto2 added 2 commits August 24, 2023 16:50

add blas for SplineC2C applyRotation

85b18c5

Merge branch 'spl_c2c_rotation_with_blas' into spl_rotation_with_blas

79e3de9

camelto2 dismissed markdewing’s stale review via 79e3de9 August 24, 2023 22:52

camelto2 changed the title ~~SplineR2R rotation with blas~~ SplineX2X rotation with BLAS Aug 24, 2023

camelto2 added 2 commits August 24, 2023 16:58

use valuetype in blas call

3259dde

clang format

2fb7ce3

quantumsteve reviewed Aug 25, 2023

View reviewed changes

src/QMCWaveFunctions/BsplineFactory/SplineR2R.cpp Outdated Show resolved Hide resolved

quantumsteve reviewed Aug 25, 2023

View reviewed changes

src/QMCWaveFunctions/BsplineFactory/SplineR2R.cpp Outdated Show resolved Hide resolved

camelto2 added 2 commits August 25, 2023 08:37

address comments

b1b093a

Merge remote-tracking branch 'upstream/develop' into spl_rotation_wit…

88e8077

…h_blas

ye-luo changed the title ~~SplineX2X rotation with BLAS~~ SplineC2C/R2R rotation with BLAS Aug 25, 2023

Merge branch 'develop' into spl_rotation_with_blas

493fe22

markdewing previously approved these changes Aug 28, 2023

View reviewed changes

Merge branch 'develop' into spl_rotation_with_blas

c86cbf8

jptowns previously approved these changes Aug 28, 2023

View reviewed changes

ye-luo reviewed Aug 28, 2023

View reviewed changes

address comments

91adaa9

camelto2 dismissed stale reviews from jptowns and markdewing via 91adaa9 August 28, 2023 20:24

ye-luo approved these changes Aug 28, 2023

View reviewed changes

ye-luo enabled auto-merge August 28, 2023 21:10

ye-luo merged commit 6c7b0d3 into QMCPACK:develop Aug 28, 2023

camelto2 deleted the spl_rotation_with_blas branch August 28, 2023 22:09

camelto2 mentioned this pull request Dec 8, 2023

Spline orbital optimization tracker #4865

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SplineC2C/R2R rotation with BLAS #4710

SplineC2C/R2R rotation with BLAS #4710

camelto2 commented Aug 24, 2023 •

edited

Loading

quantumsteve commented Aug 24, 2023 •

edited

Loading

markdewing commented Aug 24, 2023

ye-luo Aug 24, 2023

markdewing Aug 24, 2023

camelto2 Aug 24, 2023

markdewing Aug 24, 2023

camelto2 Aug 24, 2023

markdewing Aug 24, 2023

camelto2 Aug 24, 2023

markdewing Aug 24, 2023

camelto2 Aug 24, 2023 •

edited

Loading

camelto2 Aug 24, 2023

camelto2 commented Aug 24, 2023 •

edited

Loading

williamfgc commented Aug 24, 2023

jptowns Aug 24, 2023 •

edited

Loading

markdewing Aug 24, 2023

camelto2 commented Aug 24, 2023

markdewing commented Aug 28, 2023

ye-luo Aug 28, 2023

camelto2 Aug 28, 2023 •

edited

Loading

markdewing Aug 28, 2023

camelto2 Aug 28, 2023

markdewing Aug 28, 2023

ye-luo Aug 28, 2023

ye-luo commented Aug 28, 2023

prckent commented Aug 28, 2023

SplineC2C/R2R rotation with BLAS #4710

SplineC2C/R2R rotation with BLAS #4710

Conversation

camelto2 commented Aug 24, 2023 • edited Loading

Proposed changes

What type(s) of changes does this code introduce?

Does this introduce a breaking change?

What systems has this change been tested on?

Checklist

quantumsteve commented Aug 24, 2023 • edited Loading

markdewing commented Aug 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camelto2 Aug 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camelto2 commented Aug 24, 2023 • edited Loading

williamfgc commented Aug 24, 2023

jptowns Aug 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camelto2 commented Aug 24, 2023

markdewing commented Aug 28, 2023

Choose a reason for hiding this comment

camelto2 Aug 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ye-luo commented Aug 28, 2023

prckent commented Aug 28, 2023

camelto2 commented Aug 24, 2023 •

edited

Loading

quantumsteve commented Aug 24, 2023 •

edited

Loading

camelto2 Aug 24, 2023 •

edited

Loading

camelto2 commented Aug 24, 2023 •

edited

Loading

jptowns Aug 24, 2023 •

edited

Loading

camelto2 Aug 28, 2023 •

edited

Loading