Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SplineC2C/R2R rotation with BLAS #4710

Merged
merged 15 commits into from
Aug 28, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 35 additions & 20 deletions src/QMCWaveFunctions/BsplineFactory/SplineC2C.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "spline2/MultiBsplineEval.hpp"
#include "QMCWaveFunctions/BsplineFactory/contraction_helper.hpp"
#include "CPU/math.hpp"
#include "CPU/BLAS.hpp"

namespace qmcplusplus
{
Expand Down Expand Up @@ -57,7 +58,7 @@ void SplineC2C<ST>::storeParamsBeforeRotation()
{
const auto spline_ptr = SplineInst->getSplinePtr();
const auto coefs_tot_size = spline_ptr->coefs_size;
coef_copy_ = std::make_shared<std::vector<RealType>>(coefs_tot_size);
coef_copy_ = std::make_shared<std::vector<ST>>(coefs_tot_size);

std::copy_n(spline_ptr->coefs, coefs_tot_size, coef_copy_->begin());
}
Expand Down Expand Up @@ -120,27 +121,41 @@ void SplineC2C<ST>::applyRotation(const ValueMatrix& rot_mat, bool use_stored_co
std::copy_n(spl_coefs, coefs_tot_size, coef_copy_->begin());
}

for (int i = 0; i < basis_set_size; i++)
for (int j = 0; j < OrbitalSetSize; j++)
{
// cur_elem points to the real componend of the coefficient.
// Imag component is adjacent in memory.
const auto cur_elem = Nsplines * i + 2 * j;
ST newval_r{0.};
ST newval_i{0.};
for (auto k = 0; k < OrbitalSetSize; k++)
if constexpr (std::is_same_v<ST, RealType>)
{
//if ST is double, go ahead and use blas to make things faster
//Note that Nsplines needs to be divided by 2 since spl_coefs and coef_copy_ are stored as reals.
//Also casting them as ValueType so they are complex to do the correct gemm
BLAS::gemm('N', 'N', OrbitalSetSize, basis_set_size, OrbitalSetSize, ValueType(1.0, 0.0), rot_mat.data(),
OrbitalSetSize, (ValueType*)(*coef_copy_).data(), Nsplines / 2, ValueType(0.0, 0.0),
ye-luo marked this conversation as resolved.
Show resolved Hide resolved
(ValueType*)spl_coefs, Nsplines / 2);
}
else
{
// if ST is float, RealType is double and ValueType is std::complex<double> for C2C
// Just use naive matrix multiplication in order to avoid losing precision on rotation matrix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of doubt precision loss matters.

Copy link
Contributor Author

@camelto2 camelto2 Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdewing was mentioning that he had run into problems with precision when using the float splines and wanted the split between double and float. Maybe he can chime in with more details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were cases where splines with float precision had trouble with orbital rotation optimization, and those problems went away when I switched to double precision splines. I haven't investigated further.
The important point for these code paths is that the type of the spline (ST) and the main code type (RealType) have to match for BLAS calls to work without additional copies. For full precision builds, that means double precision splines. For mixed precision builds, I would guess the single precision splines would follow the BLAS code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How it stands currently is that coefs_copy_ and spl_coefs are ST, whereas rot_mat is passed in as ValueType. There isn't a mixed precision blas call, so in order to use blas we would have to make a copy of rot_mat of ST type. That would allow BLAS to be used always, but you are losing precision when ST is float since it is passed in as ValueType (double or std::complex). I'm not entirely sure how much that precision matters, since the output spline coeffs are ST anyway. We are doing an extra copy, but it is only Norb^2 so shouldn't be a problem.

How it is currently written avoids the extra copy of rot_mat, but only benefits from BLAS when ST == RealType.

How common are runs where ST != RealType?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NiO performance tests have the splines set with precision="single", so I think it's still an important case in general. I'm not sure how important it will be for cases involving rotation. Though my guess is someone will want to try it due to memory pressure on the size of the spline coefficients.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us defer the investigation of precision. It is important but we need the feature working first.

for (IndexType i = 0; i < basis_set_size; i++)
for (IndexType j = 0; j < OrbitalSetSize; j++)
{
const auto index = Nsplines * i + 2 * k;
ST zr = (*coef_copy_)[index];
ST zi = (*coef_copy_)[index + 1];
ST wr = rot_mat[k][j].real();
ST wi = rot_mat[k][j].imag();
newval_r += zr * wr - zi * wi;
newval_i += zr * wi + zi * wr;
// cur_elem points to the real componend of the coefficient.
// Imag component is adjacent in memory.
const auto cur_elem = Nsplines * i + 2 * j;
ST newval_r{0.};
ST newval_i{0.};
for (IndexType k = 0; k < OrbitalSetSize; k++)
{
const auto index = Nsplines * i + 2 * k;
ST zr = (*coef_copy_)[index];
ST zi = (*coef_copy_)[index + 1];
ST wr = rot_mat[k][j].real();
ST wi = rot_mat[k][j].imag();
newval_r += zr * wr - zi * wi;
newval_i += zr * wi + zi * wr;
}
spl_coefs[cur_elem] = newval_r;
spl_coefs[cur_elem + 1] = newval_i;
}
spl_coefs[cur_elem] = newval_r;
spl_coefs[cur_elem + 1] = newval_i;
}
}
}

template<typename ST>
Expand Down
2 changes: 1 addition & 1 deletion src/QMCWaveFunctions/BsplineFactory/SplineC2C.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ class SplineC2C : public BsplineSet
std::shared_ptr<MultiBspline<ST>> SplineInst;

///Copy of original splines for orbital rotation
std::shared_ptr<std::vector<RealType>> coef_copy_;
std::shared_ptr<std::vector<ST>> coef_copy_;
ye-luo marked this conversation as resolved.
Show resolved Hide resolved

vContainer_type mKK;
VectorSoaContainer<ST, 3> myKcart;
Expand Down
33 changes: 21 additions & 12 deletions src/QMCWaveFunctions/BsplineFactory/SplineR2R.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "SplineR2R.h"
#include "spline2/MultiBsplineEval.hpp"
#include "QMCWaveFunctions/BsplineFactory/contraction_helper.hpp"
#include "Platforms/CPU/BLAS.hpp"

namespace qmcplusplus
{
Expand Down Expand Up @@ -56,7 +57,7 @@ void SplineR2R<ST>::storeParamsBeforeRotation()
{
const auto spline_ptr = SplineInst->getSplinePtr();
const auto coefs_tot_size = spline_ptr->coefs_size;
coef_copy_ = std::make_shared<std::vector<RealType>>(coefs_tot_size);
coef_copy_ = std::make_shared<std::vector<ST>>(coefs_tot_size);

std::copy_n(spline_ptr->coefs, coefs_tot_size, coef_copy_->begin());
}
Expand Down Expand Up @@ -120,20 +121,28 @@ void SplineR2R<ST>::applyRotation(const ValueMatrix& rot_mat, bool use_stored_co
std::copy_n(spl_coefs, coefs_tot_size, coef_copy_->begin());
}

// Apply rotation the dumb way b/c I can't get BLAS::gemm to work...
for (auto i = 0; i < BasisSetSize; i++)

if constexpr (std::is_same_v<ST, ValueType>)
ye-luo marked this conversation as resolved.
Show resolved Hide resolved
{
for (auto j = 0; j < OrbitalSetSize; j++)
{
const auto cur_elem = Nsplines * i + j;
auto newval{0.};
for (auto k = 0; k < OrbitalSetSize; k++)
//Here, ST should be equal to ValueType, which will be double for R2R. Using BLAS to make things faster
BLAS::gemm('N', 'N', OrbitalSetSize, BasisSetSize, OrbitalSetSize, ST(1.0), rot_mat.data(), OrbitalSetSize,
(*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);
}
else
{
Copy link
Contributor

@jptowns jptowns Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tough: apparently everything I did had hidden type mis-matches because ST is not guaranteed to be equivalent to ValueType. Seems to me the underlying problem is the type system of the splines is kinda divorced from the rest of the code. If it's true that single precision may be problematic for rotations, then why not ditch single precision splines and keep the production code simple? Or is there some way to enforce double precision splines if rotation is added? Just trying to think of ways to avoid adding complexity to production code if not absolutely necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is single precision splines are still of interest for size reasons.

//Here, ST is float but ValueType is double for R2R. Due to issues with type conversions, just doing naive matrix multiplication in this case to not lose precision on rot_mat
for (IndexType i = 0; i < BasisSetSize; i++)
for (IndexType j = 0; j < OrbitalSetSize; j++)
{
const auto index = i * Nsplines + k;
newval += (*coef_copy_)[index] * rot_mat[k][j];
const auto cur_elem = Nsplines * i + j;
FullPrecValueType newval{0.};
for (IndexType k = 0; k < OrbitalSetSize; k++)
{
const auto index = i * Nsplines + k;
newval += (*coef_copy_)[index] * rot_mat[k][j];
}
spl_coefs[cur_elem] = newval;
}
spl_coefs[cur_elem] = newval;
}
}
}

Expand Down
2 changes: 1 addition & 1 deletion src/QMCWaveFunctions/BsplineFactory/SplineR2R.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ class SplineR2R : public BsplineSet
std::shared_ptr<MultiBspline<ST>> SplineInst;

///Copy of original splines for orbital rotation
std::shared_ptr<std::vector<RealType>> coef_copy_;
std::shared_ptr<std::vector<ST>> coef_copy_;

///thread private ratios for reduction when using nested threading, numVP x numThread
Matrix<TT> ratios_private;
Expand Down