Skip to content

ECP Milestone 1.7: Kokkos contribution

Attila Cangi edited this page Sep 20, 2017 · 47 revisions

Introduction

In this report we develop and port two QMCPACK kernels of the miniqmc miniapp (spline and Jastrow evaluations) to the Kokkos C++ programming model, and at least one CPU and one GPU based architecture. We make an initial assessment of:

  1. Capability (expressiveness, ability to achieve required levels of parallelization)
  2. Performance (achieved time to solution on multiple architectures)
  3. Portability (extent of modification required for performance on multiple architectures)
  4. Suitability (deviation from desired application abstractions required to port or to obtain performance, complexity, difficulty of learning curve)
  5. Support (compiler, tools, documentation)

Desired improvements and extensions to improve the objective (1-3) and subjective (4,5) measures will be identified and communicated with the Kokkos team.

Implementation

We implement the Kokkos C++ programming model in the spline and Jastrow kernels in order to achieve portability on multiple architectures.

Spline kernel

Each Monte Carlo step requires the evaluation of single particle orbitals (SPOs). The miniqmc miniapp uses a 3D tricubic B-spline based orbital representation for the evaluation of the SPOs. The 3D tricubic B-spline basis is a highly efficient representation for SPOs requiring only 64 elements at any given point in space. The Kokkos implementation of the spline kernel requires modifications to the data structures and expression of parallelism in the mini-app.

Data structures

The spline kernel evaluates the wavefunction, its gradient (a 3D vector containing the first derivative with respect to particle positions) and its hessian (a 3D tensor containing the second derivative with respect to particle positions). We convert several data structures related to these into Views which is the Kokkos abstraction for multidimensional arrays with template parameters controlling in which memory space (host or device) the data resides and which data layout is imposed.


In src/einspline/multi_bspline_create.cpp (renamed src/einspline/multi_bspline_create.c):

void set_multi_UBspline_3d_s_d(multi_UBspline_3d_s *spline, int num, double *data)
{
  ...
  double *spline_tmp = malloc(sizeof(double) * Nx * Ny * Nz); 
  ...
}
void set_multi_UBspline_3d_s_d(multi_UBspline_3d_s *spline, int num, double *data)
{
  ...
  double *spline_tmp = (double*) malloc(sizeof(double) * Nx * Ny * Nz);
  ...
}
***

In src/einspline/multi_bspline_structs.h:

  • Add Kokkos header file

    ...
    #include <Kokkos_Core.hpp>
    ...
  • Create Kokkos views

    typedef struct
    {
      
      spline_code spcode;
      type_code tcode;
      double *restrict coefs;
      
      intptr_t x_stride, y_stride, z_stride;
      Ugrid x_grid, y_grid, z_grid;
      BCtype_d xBC, yBC, zBC;
      int num_splines;
      size_t coefs_size;
    } multi_UBspline_3d_d;
    typedef struct
    {
      typedef Kokkos::View<double****, Kokkos::LayoutRight> coefs_view_t;
      spline_code spcode;
      type_code tcode;
      double *restrict coefs;
      coefs_view_t coefs_view;
      intptr_t x_stride, y_stride, z_stride;
      Ugrid x_grid, y_grid, z_grid;
      BCtype_d xBC, yBC, zBC;
      int num_splines;
      size_t coefs_size;
    } multi_UBspline_3d_d;

In src/spline2/bspline_allocator.cpp:

  • Remove functions.

    extern "C" {    
    
    void set_multi_UBspline_3d_d(multi_UBspline_3d_d *spline, int spline_num, double *data);
    void set_multi_UBspline_3d_s(multi_UBspline_3d_s *spline, int spline_num, float *data);
    
    multi_UBspline_3d_s *einspline_create_multi_UBspline_3d_s(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid, 
                                                              BCtype_s xBC, BCtype_s yBC, BCtype_s zBC, 
                                                              int num_splines);
    
    UBspline_3d_s *einspline_create_UBspline_3d_s(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid, 
                                                  BCtype_s xBC, BCtype_s yBC, BCtype_s zBC, 
                                                  float *data);
    
    multi_UBspline_3d_d *einspline_create_multi_UBspline_3d_d(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid, 
                                                              BCtype_d xBC, BCtype_d yBC, BCtype_d zBC, 
                                                              int num_splines);
    
    UBspline_3d_d *einspline_create_UBspline_3d_d(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid, 
                                                  BCtype_d xBC, BCtype_d yBC, BCtype_d zBC, 
                                                  double *data);
    }
    extern "C" {
    
    void set_multi_UBspline_3d_d(multi_UBspline_3d_d *spline, int spline_num, double *data);
    void set_multi_UBspline_3d_s(multi_UBspline_3d_s *spline, int spline_num, float *data);
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    }
  • Changes in deallocation.

    void Allocator::set(double *indata, multi_UBspline_3d_s *spline, int i)
    {
      ...
      einspline_free(singleS->coefs);
      ...
    }
    void Allocator::set(double *indata, multi_UBspline_3d_s *spline, int i)
    {
      ...
      einspline_free((void*)singleS->coefs);
      ...
    }

In src/spline2/bspline_allocator.hpp:

  • Delete Kokkos view by setting it to an empty view and relying on reference counting of views.

    class Allocator
    {
      ...
      template <typename SplineType> void destroy(SplineType *spline)
      {
        
        einspline_free(spline->coefs);
        free(spline);
      }
      ...
    }
    class Allocator
    {
      ...
      template <typename SplineType> void destroy(SplineType *spline)
      {
        // Delete View by setting it to an empty view and rely on reference counting of views.
        spline->coefs_view = typename SplineType::coefs_view_t();
        free(spline);
      }
      ...
    }

In src/spline2/einspline_allocator.cpp (Renamed src/spline2/einspline_allocator.c):

  • Add Kokkos header file

    #include <Kokkos_Core.hpp>
  • Remove inline specifier for all einspline_alloc and einspline_free.

  • Modify multi_Bspline creation:

    multi_UBspline_3d_s *
    einspline_create_multi_UBspline_3d_s(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid,
                                         BCtype_s xBC, BCtype_s yBC, BCtype_s zBC,
                                          int num_splines)
    {
       // Create new spline
       multi_UBspline_3d_s *restrict spline = malloc(sizeof(multi_UBspline_3d_s));
       ...
    }
    ...
    
    multi_UBspline_3d_d *
    einspline_create_multi_UBspline_3d_d(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid,
                                         BCtype_d xBC, BCtype_d yBC, BCtype_d zBC,
                                         int num_splines)
    {
      // Create new spline
      multi_UBspline_3d_d *restrict spline = malloc(sizeof(multi_UBspline_3d_d));
      ...
    }
    ...  
    multi_UBspline_3d_s *
    einspline_create_multi_UBspline_3d_s(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid,
                                         BCtype_s xBC, BCtype_s yBC, BCtype_s zBC,
                                          int num_splines)
    {
      // Create new spline
      multi_UBspline_3d_s *restrict spline = (multi_UBspline_3d_s *)malloc(sizeof(multi_UBspline_3d_s));
      ...
    }
    ...
    
    multi_UBspline_3d_d *
    einspline_create_multi_UBspline_3d_d(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid,
                                         BCtype_d xBC, BCtype_d yBC, BCtype_d zBC,
                                         int num_splines)
    {
       // Create new spline
       multi_UBspline_3d_d *restrict spline = new multi_UBspline_3d_d;
    ...
    }
    ...
  • Create Kokkos view

    spline->coefs_size = (size_t)Nx * spline->x_stride;
    
    
    spline->coefs = (double *)einspline_alloc(sizeof(double) * spline->coefs_size, QMC_CLINE);    
    
    
    
    
    
    
    
    
    
    
    
    // Create KokkosView
    spline->coefs_view = multi_UBspline_3d_d::coefs_view_t("Multi_UBspline_3d_d",Nx,Ny,Nz,N);
    // Check data layout is as expected
    int strides[4];
    spline->coefs_view.stride(strides);
    if(spline->x_stride!=strides[0] ||
      spline->y_stride!=strides[1] ||
      spline->z_stride!=strides[2] ||
      1!=strides[3])
      fprintf(stderr, "Kokkos View has non-compatible strides %i %i | %i %i | %i %i\n",
          spline->x_stride,strides[0],
          spline->y_stride,strides[1],
          spline->z_stride,strides[2]
          );
    spline->coefs = spline->coefs_view.data();
  • Modify Bspline creation

    UBspline_3d_d *einspline_create_UBspline_3d_d(Ugrid x_grid, Ugrid y_grid,
                                                  Ugrid z_grid, BCtype_d xBC,
                                                  BCtype_d yBC, BCtype_d zBC,
                                                  double *data)
    {
      // Create new spline
      UBspline_3d_d *restrict spline = malloc(sizeof(UBspline_3d_d));
      ...
    }
    ...
    
    UBspline_3d_s *einspline_create_UBspline_3d_s(Ugrid x_grid, Ugrid y_grid,
                                                  Ugrid z_grid, BCtype_s xBC,
                                                  BCtype_s yBC, BCtype_s zBC,
                                                  float *data)
    {
      // Create new spline
      UBspline_3d_s *spline = malloc(sizeof(UBspline_3d_s));
      ...
    }
    ...  
    UBspline_3d_d *einspline_create_UBspline_3d_d(Ugrid x_grid, Ugrid y_grid,
                                                  Ugrid z_grid, BCtype_d xBC,
                                                  BCtype_d yBC, BCtype_d zBC,
                                                  double *data)
    {
      // Create new spline
      UBspline_3d_d *restrict spline = (UBspline_3d_d *)malloc(sizeof(UBspline_3d_d));
      ...
    }
    ...
    
    UBspline_3d_s *einspline_create_UBspline_3d_s(Ugrid x_grid, Ugrid y_grid,
                                                  Ugrid z_grid, BCtype_s xBC,
                                                  BCtype_s yBC, BCtype_s zBC,
                                                  float *data)
    {
      // Create new spline
      UBspline_3d_s *spline = (UBspline_3d_s *)malloc(sizeof(UBspline_3d_s));
      ...
    }
    ...

In src/spline2/einspline_allocator.h:

  • Add multi_Bspline and Bspline functions

    extern "C" {
      #endif
      
      void *einspline_alloc(size_t size, size_t alignment);
      
      void einspline_free(void *ptr); 
      
      
    
    
    
    
      
      
      
      
      
      
      
      
      
      
      #ifdef __cplusplus
    }  
    extern "C" {
      
       
      void *einspline_alloc(size_t size, size_t alignment);
      
      void einspline_free(void *ptr);
    
      multi_UBspline_3d_s *einspline_create_multi_UBspline_3d_s(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid, 
                                                                BCtype_s xBC, BCtype_s yBC, BCtype_s zBC, 
                                                                int num_splines);
      
      multi_UBspline_3d_d *einspline_create_multi_UBspline_3d_d(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid, 
                                                                BCtype_d xBC, BCtype_d yBC, BCtype_d zBC, 
                                                                int num_splines);
      
      UBspline_3d_s *einspline_create_UBspline_3d_s(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid, 
                                                    BCtype_s xBC, BCtype_s yBC, BCtype_s zBC, 
                                                    float *data);
      
      UBspline_3d_d *einspline_create_UBspline_3d_d(Ugrid x_grid, Ugrid y_grid, Ugrid z_grid, 
                                                    BCtype_d xBC, BCtype_d yBC, BCtype_d zBC, 
                                                    double *data);
      
    }

In src/spline2/einspline_spo.hpp:

  • Modify data structures

    struct einspline_spo
    {
      struct EvaluateVGHTag {};
      struct EvaluateVTag {};
      typedef Kokkos::TeamPolicy<EvaluateVGHTag> policy_vgh_t;
      typedef Kokkos::TeamPolicy<EvaluateVTag> policy_v_t;
      
      typedef typename Kokkos::TeamPolicy<>::member_type team_t;
      
      // using spline_type=MultiBspline<T>;
      // define the einspline data object type
      using spline_type = typename bspline_traits<T, 3>::SplineType;
      using pos_type        = TinyVector<T, 3>;
      using vContainer_type = aligned_vector<T>;
      using gContainer_type = VectorSoaContainer<T, 3>;
      using hContainer_type = VectorSoaContainer<T, 6>;
      ...
      // mark instance as copy
      bool is_copy;
       
      aligned_vector<spline_type *> einsplines;
      aligned_vector<vContainer_type *> psi;
      aligned_vector<gContainer_type *> grad;
      aligned_vector<hContainer_type *> hess;
      ...
      
      
      
      
      
      
       
      // using spline_type=MultiBspline<T>;
      using pos_type        = TinyVector<T, 3>;
      using vContainer_type = aligned_vector<T>;
      ...
      
      lattice_type Lattice;
      
      
      
      
      
      
      
      aligned_vector<spline_type *> einsplines;
      ...
      // default constructor
      einspline_spo()
          : nBlocks(0), nSplines(0), firstBlock(0), lastBlock(0), Owner(false)
      {
      }
      // disable copy constructor
      einspline_spo(const einspline_spo &in) = delete;
      // disable copy operator
      einspline_spo &operator=(const einspline_spo &in) = delete;
      ...
      einspline_spo(einspline_spo &in, int ncrews, int crewID)
          : Owner(false), Lattice(in.Lattice)
      {
        nSplines         = in.nSplines;
        nSplinesPerBlock = in.nSplinesPerBlock;
        nBlocks          = (in.nBlocks + ncrews - 1) / ncrews;
        firstBlock       = nBlocks * crewID;
        lastBlock        = std::min(in.nBlocks, nBlocks * (crewID + 1));
        nBlocks          = lastBlock - firstBlock;
        einsplines.resize(nBlocks);
        for (int i = 0, t = firstBlock; i < nBlocks; ++i, ++t)
          einsplines[i] = in.einsplines[t];
        resize();
      }
      ...
      einspline_spo(einspline_spo_ref<T,compute_engine_type> &in, int ncrews, int crewID)
           : Owner(false), Lattice(in.Lattice), is_copy(false)
      {
        nSplines         = in.nSplines;
        nSplinesPerBlock = in.nSplinesPerBlock;
        nBlocks          = (in.nBlocks + ncrews - 1) / ncrews;
        firstBlock       = nBlocks * crewID;
        lastBlock        = std::min(in.nBlocks, nBlocks * (crewID + 1));
        nBlocks          = lastBlock - firstBlock;
        einsplines.resize(nBlocks);
          for (int i = 0, t = firstBlock; i < nBlocks; ++i, ++t)
        einsplines[i] = in.einsplines[t];
        resize();
      }
      ...
      // destructors
      ~einspline_spo()
      {
        
        if (psi.size()) clean();
        if (Owner)
          for (int i = 0; i < nBlocks; ++i)
            delete einsplines[i];
      }
      void clean()
      {
        const int n = psi.size();
        for (int i = 0; i < n; ++i)
        {
          delete psi[i];
          delete grad[i];
          delete hess[i];
        }
      }
      // resize the containers
      void resize()
      {
        if (nBlocks > psi.size())
        {
          clean();
          psi.resize(nBlocks);
          grad.resize(nBlocks);
          hess.resize(nBlocks);
    #pragma omp parallel for
          for (int i = 0; i < nBlocks; ++i)
          {
            psi[i]  = new vContainer_type(nSplinesPerBlock);
            grad[i] = new gContainer_type(nSplinesPerBlock);
            hess[i] = new hContainer_type(nSplinesPerBlock);
          }
        }
      }
      // fix for general num_splines
      void set(int nx, int ny, int nz, int num_splines, int nblocks,
               bool init_random = true)
      {
        nSplines         = num_splines;
        nBlocks          = nblocks;
        nSplinesPerBlock = num_splines / nblocks;
        firstBlock       = 0;
        lastBlock        = nBlocks;
        if (einsplines.empty())
        {
          Owner = true;
          TinyVector<int, 3> ng(nx, ny, nz);
          pos_type start(0);
          pos_type end(1);
          einsplines.resize(nBlocks);
          RandomGenerator<T> myrandom(11);
          Array<T, 3> data(nx, ny, nz);
          std::fill(data.begin(), data.end(), T());
          myrandom.generate_uniform(data.data(), data.size());
            for (int i = 0; i < nBlocks; ++i)
            {
              einsplines[i] = myAllocator.createMultiBspline(T(0), start, end, ng, PERIODIC, nSplinesPerBlock);
              if (init_random)
                for (int j = 0; j < nSplinesPerBlock; ++j)
                  myAllocator.set(data.data(), einsplines[i], j);
            }
          }
          resize();
        }
       ...
      }
    struct einspline_spo
    {
    struct EvaluateVGHTag {};
    struct EvaluateVTag {};
    typedef Kokkos::TeamPolicy<EvaluateVGHTag> policy_vgh_t;
    typedef Kokkos::TeamPolicy<EvaluateVTag> policy_v_t;
    
    typedef typename Kokkos::TeamPolicy<>::member_type team_t;
    
    // using spline_type=MultiBspline<T>;
    // define the einspline data object type
    using spline_type = typename bspline_traits<T, 3>::SplineType;
    using pos_type        = TinyVector<T, 3>;
    using vContainer_type = Kokkos::View<T**>;
    using gContainer_type = Kokkos::View<T***,Kokkos::LayoutRight>;
    using hContainer_type = Kokkos::View<T***,Kokkos::LayoutRight>;
    ...
    // mark instance as copy
    bool is_copy;
     
    Kokkos::View<spline_type *> einsplines;
    vContainer_type psi;
    gContainer_type grad;
    hContainer_type hess;
    ...
    struct EvaluateVTag {};
    struct EvaluateVGHTag {};
    typedef Kokkos::TeamPolicy<EvaluateVTag> policy_v_t;
    typedef Kokkos::TeamPolicy<EvaluateVGHTag> policy_vgh_t;
    
    typedef typename Kokkos::TeamPolicy<>::member_type team_t;
    
    // using spline_type=MultiBspline<T>;
    using pos_type        = TinyVector<T, 3>;
    using vContainer_type = aligned_vector<T>;
    ...
    
    lattice_type Lattice;
    
    // Needed to be stored in class since we use it as the functor
    pos_type pos;
    
    // mark instance as copy
    bool is_copy;
     
    aligned_vector<spline_type *> einsplines;
    ...
    // default constructor
    einspline_spo()
        : nBlocks(0), nSplines(0), firstBlock(0), lastBlock(0), Owner(false), is_copy(false)
    {
    }
    // disable copy constructor
    // einspline_spo(const einspline_spo &in) = delete;
    // disable copy operator
    einspline_spo &operator=(const einspline_spo &in) = delete;
    ...
    einspline_spo(einspline_spo &in, int ncrews, int crewID)
        : Owner(false), Lattice(in.Lattice)
    {
      nSplines         = in.nSplines;
      nSplinesPerBlock = in.nSplinesPerBlock;
      nBlocks          = (in.nBlocks + ncrews - 1) / ncrews;
      firstBlock       = nBlocks * crewID;
      lastBlock        = std::min(in.nBlocks, nBlocks * (crewID + 1));
      nBlocks          = lastBlock - firstBlock;
      einsplines       = Kokkos::View<spline_type*>("einsplines",nBlocks);
      for (int i = 0, t = firstBlock; i < nBlocks; ++i, ++t)
        einsplines[i] = in.einsplines[t];
      resize();
    }
    ...
    einspline_spo(einspline_spo_ref<T,compute_engine_type> &in, int ncrews, int crewID)
         : Owner(false), Lattice(in.Lattice), is_copy(false)
    {
      nSplines         = in.nSplines;
      nSplinesPerBlock = in.nSplinesPerBlock;
      nBlocks          = (in.nBlocks + ncrews - 1) / ncrews;
      firstBlock       = nBlocks * crewID;
      lastBlock        = std::min(in.nBlocks, nBlocks * (crewID + 1));
      nBlocks          = lastBlock - firstBlock;
      einsplines       = Kokkos::View<spline_type*>("einsplines",nBlocks);
        for (int i = 0, t = firstBlock; i < nBlocks; ++i, ++t)
      einsplines[i] = in.einsplines[t];
      resize();
    }
    ...
    // destructors
    ~einspline_spo()
    {
      if(! is_copy) {
      if (psi.extent(0)) clean();
      einsplines = Kokkos::View<spline_type *>() ;
      
      }
    }
    void clean()
    
    
    
    {
      psi = vContainer_type()  ;
      grad = gContainer_type()  ;
      hess = hContainer_type() ;
      
    }
    // resize the containers
    void resize()
    {
      if (nBlocks > psi.size())
      {
        clean();
        
        
        
        
        
        
        
        psi = vContainer_type("Psi",nBlocks,nSplinesPerBlock) ;
        grad = gContainer_type("Grad",nBlocks,3,nSplinesPerBlock) ;
        hess = hContainer_type("Hess",nBlocks,6,nSplinesPerBlock);
      }
    }
    // fix for general num_splines
    void set(int nx, int ny, int nz, int num_splines, int nblocks,
             bool init_random = true)
    {
      nSplines         = num_splines;
      nBlocks          = nblocks;
      nSplinesPerBlock = num_splines / nblocks;
      firstBlock       = 0;
      lastBlock        = nBlocks;
      if (einsplines.extent(0)==0)
      {
        Owner = true;
        TinyVector<int, 3> ng(nx, ny, nz);
        pos_type start(0);
        pos_type end(1);
        einsplines       = Kokkos::View<spline_type*>("einsplines",nBlocks);
        RandomGenerator<T> myrandom(11);
        Array<T, 3> data(nx, ny, nz);
        std::fill(data.begin(), data.end(), T());
        myrandom.generate_uniform(data.data(), data.size());
        for (int i = 0; i < nBlocks; ++i)
        {
          einsplines[i] = *myAllocator.createMultiBspline(T(0), start, end, ng, PERIODIC, nSplinesPerBlock);
          if (init_random)
            for (int j = 0; j < nSplinesPerBlock; ++j)
              myAllocator.set(data.data(), &einsplines[i], j);
        }
      }
      resize();
    }
    ...
    }

Parallel execution

In src/QMCWaveFunctions/einspline_spo.hpp:

  • Add Kokkos::parallel_for to evaluate_v, evaluate_vgl and tidy up data structures in evaluate_v, evaluate_vgh, evaluate_vgl

    struct einspline_spo
    {
      ...
      /** evaluate psi */
      inline void evaluate_v(const pos_type &p)
      {
        auto u = Lattice.toUnit(p);
        for (int i = 0; i < nBlocks; ++i)
          compute_engine.evaluate_v(einsplines[i], u[0], u[1], u[2], psi[i]->data(), psi[i]->size());
          
          
          
          
          
          
          
          
      }
      /** evaluate psi */
      inline void evaluate_v_pfor(const pos_type &p)
      {
        auto u = Lattice.toUnit(p);
        #pragma omp for nowait
        for (int i = 0; i < nBlocks; ++i)
          compute_engine.evaluate_v(einsplines[i], u[0], u[1], u[2], psi[i]->data(), psi[i]->size());
      }
      /** evaluate psi, grad and lap */
      inline void evaluate_vgl(const pos_type &p)
      {
        auto u = Lattice.toUnit(p);
        for (int i = 0; i < nBlocks; ++i)
          compute_engine.evaluate_vgl(einsplines[i], u[0], u[1], u[2],
                                      psi[i]->data(), grad[i]->data(), hess[i]->data(),
                                      psi[i]->size());
      }
      /** evaluate psi, grad and lap */
      inline void evaluate_vgl_pfor(const pos_type &p)
      {
        auto u = Lattice.toUnit(p);
        #pragma omp for nowait
        for (int i = 0; i < nBlocks; ++i)
          compute_engine.evaluate_vgl(einsplines[i], u[0], u[1], u[2],
                                      psi[i]->data(), grad[i]->data(), hess[i]->data(),
                                      psi[i]->size());
      }    
      ...
      /** evaluate psi, grad and hess */
      inline void evaluate_vgh(const pos_type &p)
      {
        auto u = Lattice.toUnit(p);
        <br>
        for (int i = 0; i < nBlocks; ++i)
          einsplines[i]->evaluate_vgh(u, *psi[i], *grad[i], *hess[i]);
        
        
        
        
        
        
        
        
        
      }
      /** evaluate psi, grad and hess */
      inline void evaluate_vgh_pfor(const pos_type &p)
      {
        auto u = Lattice.toUnit(p);
        #pragma omp for nowait
        for (int i = 0; i < nBlocks; ++i)
          compute_engine.evaluate_vgh(einsplines[i], u[0], u[1], u[2],
                                      psi[i]->data(), grad[i]->data(), hess[i]->data(),
                                      psi[i]->size());
      }
      ...
    }
    struct einspline_spo
    {
      ...
      /** evaluate psi */
      inline void evaluate_v(const pos_type &p)
      {
        pos = p;
        is_copy = true;
        Kokkos::parallel_for(policy_v_t(nBlocks,1),*this);
        is_copy = false;
      }
      KOKKOS_INLINE_FUNCTION
      void operator() (const EvaluateVTag&, const team_t& team ) const {
        int block = team.league_rank();
        auto u = Lattice.toUnit(pos);
        compute_engine.evaluate_v(&einsplines[block], u[0], u[1], u[2],
                                  &psi(block,0), psi.extent(1));
      }
      /** evaluate psi */
      inline void evaluate_v_pfor(const pos_type &p)
      {
        auto u = Lattice.toUnit(p);
        #pragma omp for nowait
        for (int i = 0; i < nBlocks; ++i)
          compute_engine.evaluate_v(&einsplines[i], u[0], u[1], u[2], &psi(i,0), psi.extent(0));
      }
      /** evaluate psi, grad and lap */
      inline void evaluate_vgl(const pos_type &p)
      {
        auto u = Lattice.toUnit(p);
        for (int i = 0; i < nBlocks; ++i)
          compute_engine.evaluate_vgl(&einsplines[i], u[0], u[1], u[2],
                                      &psi(i,0), &grad(i,0,0), &hess(i,0,0),
                                      psi.extent(1));
      }
      /** evaluate psi, grad and lap */
      inline void evaluate_vgl_pfor(const pos_type &p)
      {
        auto u = Lattice.toUnit(p);
        #pragma omp for nowait
        for (int i = 0; i < nBlocks; ++i)
          compute_engine.evaluate_vgl(&einsplines[i], u[0], u[1], u[2],
                                      &psi(i,0), &grad(i,0,0), &hess(i,0,0),
                                       psi.extent(1));
      }    
      ...    
      /** evaluate psi, grad and hess */
      inline void evaluate_vgh(const pos_type &p)
      {
        pos = p;
        is_copy = true;
        Kokkos::parallel_for(policy_vgh_t(nBlocks,1),*this);
        is_copy = false;
      }
      <br>
      KOKKOS_INLINE_FUNCTION
      void operator() (const EvaluateVGHTag&, const team_t& team ) const {
        int block = team.league_rank();
        auto u = Lattice.toUnit(pos);
        compute_engine.evaluate_vgh(&einsplines[block], u[0], u[1], u[2],
                                    &psi(block,0), &grad(block,0,0), &hess(block,0,0),
                                    psi.extent(1));
      }
      /** evaluate psi, grad and hess */
      inline void evaluate_vgh_pfor(const pos_type &p)
      {
        auto u = Lattice.toUnit(p);
        #pragma omp for nowait
        for (int i = 0; i < nBlocks; ++i)
          compute_engine.evaluate_vgh(&einsplines[i], u[0], u[1], u[2],
                                      &psi(i,0), &grad(i,0,0), &hess(i,0,0),
                                       psi.extent(1));
      }
      ...
    }

Jastrow kernel

Assessment

We assess our Kokkos implementation of the spline and Jastrow kernels on the following architectures.

  • Many-core architecture: Intel Xeon Phi (KNL)
  • GPU: Nvidia K40

Spline kernel

Jastrow kernel

Conclusions and Future Work

Acknowledgements

This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations—the Office of Science and the National Nuclear Security Administration responsible for the planning and preparation of a capable exascale ecosystem including software, applications, hardware, advanced system engineering, and early testbed platforms to support the nation’s exascale computing imperative.

Clone this wiki locally