PackInfo code generation #4994

jngrad · 2024-09-20T17:41:56Z

Description of changes:

reduce volume of the LB communication during the streaming step
avoid superfluous LB ghost communication outside the streaming step
use AVX streaming kernels

Use the AVX streaming kernels. Adjust pre-conditions of LB tests. Fix CMake bug in the benchmarks. Tweak benchmark summary information.

jngrad · 2024-09-20T20:14:05Z

testsuite/python/lb.py

@@ -674,7 +674,7 @@ def test_tracers_coupling_rounding(self):
 self.system.thermostat.set_lb(LB_fluid=lbf, seed=3, gamma=self.gamma)
 rtol = self.rtol
 if lbf.single_precision:
- rtol *= 100.
+ rtol *= 200.


I had to increase this tolerance because the single-precision kernels are very sensitive to precision loss. In fact, if you regenerate the kernels enough times, you will eventually get a collide kernel where the order of operations is such that this test fails. I had to double the tolerance to avoid failures.

What is really peculiar is how with 2 MPI ranks or more, the initial LB populations are not bitwise identical in the 8 nodes sampled by the tracer. I would have expected them to be the same, since we don't use a thermalized kernel. I invested 2 days on this oddity and still don't understand the discrepancy. This oddity was not introduced by this PR, it's been here for a while.

jngrad · 2024-09-20T20:27:23Z

src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp

+void PackInfoPdfDoublePrecision::pack(Direction dir, unsigned char *byte_buffer, IBlock *block) const {
+ byte_buffer += sizeof(double) - (reinterpret_cast<std::size_t>(byte_buffer) - (reinterpret_cast<std::size_t>(byte_buffer) / sizeof(double)) * sizeof(double));
+ double *buffer = reinterpret_cast<double *>(byte_buffer);


For some reason, the byte_buffer is not always aligned properly. I don't understand why, and the MPI code in waLBerla is really complex to analyze. I couldn't find where the memory allocation took place. You can investigate this issue in GDB by removing lines 524 and 899, and breaking on symbol __ubsan::Diag::~Diag. When we enter this function, the buffer may already contain some information, such as the field dimensions, but those values are 8 bytes long (int64_t). This cannot explain why we sometimes enter this function with 6 bytes already allocated, unless some function pushes 6 char flags to the buffer? This bug doesn't manifest itself with the waLBerla default PackInfo class. The full UBSAN report is reproduced below.

With this manual bugfix, the code passes on both AMD and Intel chips. One should be able to re-apply it with git cherry-pick 960c2ade when regenerating the kernels. I'll try to come up with a codegen MWE and submit a bug report on the waLBerla issue tracker.

UBSAN report

114/130 Test #103: LBWalberlaImpl_unit_tests ................***Failed 2.73 sec Running 29 test cases... Running 29 test cases... pos [5.4, -0.6, -0.6], node [4, -2, -2], weight 0.001 pos [-0.6, -0.6, -0.6], node [-2, -2, -2], weight 0.001 pos [5.4, -0.6, -0.6], node [4, -2, -2], weight 0.001 pos [-0.6, -0.6, -0.6], node [-2, -2, -2], weight 0.001 pos [5.4, -0.6, -0.6], node [4, -2, -2], weight 0.001 pos [-0.6, -0.6, -0.6], node [-2, -2, -2], weight 0.001 pos [5.4, -0.6, -0.6], node [4, -2, -2], weight 0.001 pos [-0.6, -0.6, -0.6], node [-2, -2, -2], weight 0.001 /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:9: runtime error: store to misaligned address 0x59034fc9c776 for type 'double', which requires 8 byte alignment /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:9: runtime error: store to misaligned address 0x6352aa97ee66 for type 'double', which requires 8 byte alignment 0x59034fc9c776: note: pointer points here 0x6352aa97ee66: note: pointer points here 03 00 00 00 00 00 a0 52 97 b7 cd 74 00 00 60 c7 c9 4f 03 59 00 00 60 c7 c9 4f 03 59 00 00 dd dd ^ 03 00 00 00 00 00 a0 52 17 f8 01 79 00 00 50 ee 97 aa 52 63 00 00 50 ee 97 aa 52 63 00 00 dd dd ^ #0 0x74cdb93aeaa1 in walberla::pystencils::internal_pack_W::pack_W(double*, double*, long, long, long, long, long, long, long) /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:102 #1 0x74cdb93aeaa1 in walberla::pystencils::PackInfoPdfDoublePrecision::pack(walberla::stencil::Direction, unsigned char*, walberla::domain_decomposition::IBlock*) const /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:586:5 #2 0x59034e6f447b in walberla::communication::UniformPackInfo::packData(walberla::domain_decomposition::IBlock const*, walberla::stencil::Direction, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /builds/espressomd/espresso/build/_deps/walberla-src/src/communication/UniformPackInfo.h:168:4 #3 0x59034e716b4c in std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>::operator()(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9 #4 0x59034e716b4c in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::send(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&, std::vector<std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>, std::allocator<std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>>>&) /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:481:7 #5 0x59034e702fd5 in std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>::operator()(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9 #6 0x59034e702fd5 in walberla::mpi::GenericOpenMPBufferSystem<walberla::mpi::GenericRecvBuffer<unsigned char>, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>>::startCommunicationOpenMP() /builds/espressomd/espresso/build/_deps/walberla-src/src/core/mpi/OpenMPBufferSystem.impl.h:137:7 #7 0x59034e714e05 in walberla::mpi::GenericOpenMPBufferSystem<walberla::mpi::GenericRecvBuffer<unsigned char>, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>>::startCommunication() /builds/espressomd/espresso/build/_deps/walberla-src/src/core/mpi/OpenMPBufferSystem.impl.h:96:7 #8 0x59034e714e05 in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::startCommunication() /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:379:18 #9 0x59034e6e12b3 in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::communicate() /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:224:4 #10 0x59034e6e12b3 in walberla::LBWalberlaImpl<double, (lbmpy::Arch)0>::integrate_push_scheme() /builds/espressomd/espresso/src/walberla_bridge/tests/../src/lattice_boltzmann/LBWalberlaImpl.hpp:581:35 #11 0x59034e6ab912 in walberla::LBWalberlaImpl<double, (lbmpy::Arch)0>::integrate() /builds/espressomd/espresso/src/walberla_bridge/tests/../src/lattice_boltzmann/LBWalberlaImpl.hpp:632:7 #12 0x59034e7d9391 in void forces_book_keepingcase::_impl<std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)>>(std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const&) /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:489:9 #13 0x59034e7d864c in void forces_book_keepingcase::test_method<std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const&>(std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const& const&) /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:466:1 #14 0x74cdb9c5dc15 (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x28c15) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #15 0x74cdb9c63cd4 in boost::execution_monitor::catch_signals(boost::function<int ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2ecd4) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #16 0x74cdb9c6414e in boost::execution_monitor::execute(boost::function<int ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2f14e) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #17 0x74cdb9c64237 in boost::execution_monitor::vexecute(boost::function<void ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2f237) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #18 0x74cdb9c81e4e in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::function<void ()> const&, unsigned long) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x4ce4e) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #19 0x74cdb9ca59c9 (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x709c9) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #20 0x74cdb9ca5d7a (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x70d7a) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #21 0x74cdb9ca5d7a (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x70d7a) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #22 0x74cdb9c6e768 in boost::unit_test::framework::run(unsigned long, bool) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x39768) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #23 0x74cdb9c81783 in boost::unit_test::unit_test_main(bool (*)(), int, char**) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x4c783) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #24 0x59034e69f9a9 in main /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:636:20 #25 0x74cdb779b1c9 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 4d9090d61bf70e6b3225d583f0f08193f54670b2) #26 0x74cdb779b28a in __libc_start_main (/usr/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 4d9090d61bf70e6b3225d583f0f08193f54670b2) #27 0x59034e670d34 in _start (/builds/espressomd/espresso/build/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests+0x10ed34) (BuildId: 33de3e955fa8eda7e1f426d1400f93adff5ef8a4) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:9 -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- #0 0x7901f99aeaa1 in walberla::pystencils::internal_pack_W::pack_W(double*, double*, long, long, long, long, long, long, long) /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:102 #1 0x7901f99aeaa1 in walberla::pystencils::PackInfoPdfDoublePrecision::pack(walberla::stencil::Direction, unsigned char*, walberla::domain_decomposition::IBlock*) const /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:586:5 #2 0x6352a84e647b in walberla::communication::UniformPackInfo::packData(walberla::domain_decomposition::IBlock const*, walberla::stencil::Direction, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /builds/espressomd/espresso/build/_deps/walberla-src/src/communication/UniformPackInfo.h:168:4 #3 0x6352a8508b4c in std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>::operator()(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9 #4 0x6352a8508b4c in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::send(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&, std::vector<std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>, std::allocator<std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>>>&) /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:481:7 #5 0x6352a84f4fd5 in std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>::operator()(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9 #6 0x6352a84f4fd5 in walberla::mpi::GenericOpenMPBufferSystem<walberla::mpi::GenericRecvBuffer<unsigned char>, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>>::startCommunicationOpenMP() /builds/espressomd/espresso/build/_deps/walberla-src/src/core/mpi/OpenMPBufferSystem.impl.h:137:7 #7 0x6352a8506e05 in walberla::mpi::GenericOpenMPBufferSystem<walberla::mpi::GenericRecvBuffer<unsigned char>, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>>::startCommunication() /builds/espressomd/espresso/build/_deps/walberla-src/src/core/mpi/OpenMPBufferSystem.impl.h:96:7 #8 0x6352a8506e05 in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::startCommunication() /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:379:18 #9 0x6352a84d32b3 in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::communicate() /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:224:4 #10 0x6352a84d32b3 in walberla::LBWalberlaImpl<double, (lbmpy::Arch)0>::integrate_push_scheme() /builds/espressomd/espresso/src/walberla_bridge/tests/../src/lattice_boltzmann/LBWalberlaImpl.hpp:581:35 #11 0x6352a849d912 in walberla::LBWalberlaImpl<double, (lbmpy::Arch)0>::integrate() /builds/espressomd/espresso/src/walberla_bridge/tests/../src/lattice_boltzmann/LBWalberlaImpl.hpp:632:7 #12 0x6352a85cb391 in void forces_book_keepingcase::_impl<std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)>>(std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const&) /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:489:9 #13 0x6352a85ca64c in void forces_book_keepingcase::test_method<std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const&>(std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const& const&) /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:466:1 #14 0x7901fa37ec15 (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x28c15) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #15 0x7901fa384cd4 in boost::execution_monitor::catch_signals(boost::function<int ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2ecd4) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #16 0x7901fa38514e in boost::execution_monitor::execute(boost::function<int ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2f14e) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #17 0x7901fa385237 in boost::execution_monitor::vexecute(boost::function<void ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2f237) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #18 0x7901fa3a2e4e in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::function<void ()> const&, unsigned long) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x4ce4e) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #19 0x7901fa3c69c9 (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x709c9) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #20 0x7901fa3c6d7a (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x70d7a) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #21 0x7901fa3c6d7a (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x70d7a) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #22 0x7901fa38f768 in boost::unit_test::framework::run(unsigned long, bool) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x39768) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #23 0x7901fa3a2783 in boost::unit_test::unit_test_main(bool (*)(), int, char**) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x4c783) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e) #24 0x6352a84919a9 in main /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:636:20 #25 0x7901f7f9b1c9 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 4d9090d61bf70e6b3225d583f0f08193f54670b2) #26 0x7901f7f9b28a in __libc_start_main (/usr/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 4d9090d61bf70e6b3225d583f0f08193f54670b2) #27 0x6352a8462d34 in _start (/builds/espressomd/espresso/build/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests+0x10ed34) (BuildId: 33de3e955fa8eda7e1f426d1400f93adff5ef8a4) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:9 -------------------------------------------------------------------------- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[16478,1],1] Exit code: 1 --------------------------------------------------------------------------

Using the waLBerla codegen tutorial and doing minor tweaks to generate sweeps for 3D fields, I am unable to reproduce this bytes buffer issue. I also still don't know which type of information would require only 1 byte. Most of the data we use are 4 bytes (int32_t, float) or 8 bytes (int64_t, double). I'll need more time to get to the bottom of this.

jngrad · 2024-09-20T20:29:49Z

src/walberla_bridge/src/lattice_boltzmann/lb_kernels.hpp

@@ -53,14 +59,17 @@ template <typename FT = double, Arch AT = Arch::CPU> struct KernelTrait {
 pystencils::CollideSweepDoublePrecisionThermalizedAVX;
 using CollisionModelLeesEdwards =
 pystencils::CollideSweepDoublePrecisionLeesEdwardsAVX;
+ using StreamSweep = pystencils::StreamSweepDoublePrecisionAVX;


The AVX stream sweeps only improve performance in single-precision. I'm including the double-precision kernel to be consistent.

jngrad · 2024-09-20T20:35:57Z

src/walberla_bridge/src/lattice_boltzmann/LBWalberlaImpl.hpp

+ if (m_has_boundaries or (m_collision_model and has_lees_edwards_bc())) {
+ setup.template operator()<PackInfo<PdfField>>();
+ } else {
+ setup.template operator()<PackInfoStreaming<PdfField>>();
+ }


The code-generated PackInfo do not work with UBB. After two days of investigation, I still don't understand why there is extra fluid densities flowing into LB fluid cells that are in contact with LB flag cells that have a slip velocity. You can replace PackInfo by PackInfoStreaming at line 441 and run ./pypresso ../testsuite/python/lb_boundary_velocity.py LBBoundaryVelocityTest.test_wall_slip_parallel to reproduce the bug. Edit the collide and stream kernels to print the values streamed along directions 5 and 6 (top resp. bottom) in the first fluid cell in contact with a flag cell. Use a box size of [8, 1, 1] in units of agrid for simplicity.

RudolfWeeber · 2024-09-27T15:24:28Z

On my system, the LB benchmark with 1k particles/core is now 30% faster for 1 and 8 cores and for GPU. That's a significant imporement!

For grids below ~40 cubed, the CPU is version is now close to the GPu one (30% slower, or so). At some point, we might look into that some more. Long term, w emight

split the particle coupling to hide the laency of getting the interpolated velocities form the GPU
and/or move the entire particle coupling to the GPU

jngrad added 4 commits September 17, 2024 18:36

General LB maintenance

3c0ccf2

Use the AVX streaming kernels. Adjust pre-conditions of LB tests. Fix CMake bug in the benchmarks. Tweak benchmark summary information.

Split LB communicators

5e39e84

Generate PackInfo

8ea7111

PackInfo MPI buffer alignment bugfix

960c2ad

jngrad commented Sep 20, 2024

View reviewed changes

jngrad added waLBerla Issues regarding waLBerla integration Performance labels Sep 20, 2024

jngrad requested a review from reinaual September 20, 2024 20:48

RudolfWeeber approved these changes Sep 27, 2024

View reviewed changes

jngrad added the automerge Merge with kodiak label Sep 27, 2024

kodiakhq bot merged commit 03033c4 into espressomd:python Sep 27, 2024
10 checks passed

jngrad deleted the packinfo branch September 27, 2024 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PackInfo code generation #4994

PackInfo code generation #4994

jngrad commented Sep 20, 2024

jngrad Sep 20, 2024

jngrad Sep 20, 2024

jngrad Sep 27, 2024

jngrad Sep 20, 2024

jngrad Sep 20, 2024

RudolfWeeber commented Sep 27, 2024

PackInfo code generation #4994

PackInfo code generation #4994

Conversation

jngrad commented Sep 20, 2024

jngrad Sep 20, 2024

Choose a reason for hiding this comment

jngrad Sep 20, 2024

Choose a reason for hiding this comment

jngrad Sep 27, 2024

Choose a reason for hiding this comment

jngrad Sep 20, 2024

Choose a reason for hiding this comment

jngrad Sep 20, 2024

Choose a reason for hiding this comment

RudolfWeeber commented Sep 27, 2024