Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PackInfo code generation #4994

Merged
merged 4 commits into from
Sep 27, 2024
Merged

PackInfo code generation #4994

merged 4 commits into from
Sep 27, 2024

Conversation

jngrad
Copy link
Member

@jngrad jngrad commented Sep 20, 2024

Fixes #4921

Description of changes:

  • reduce volume of the LB communication during the streaming step
  • avoid superfluous LB ghost communication outside the streaming step
  • use AVX streaming kernels

Use the AVX streaming kernels. Adjust pre-conditions of LB tests.
Fix CMake bug in the benchmarks. Tweak benchmark summary information.
@@ -674,7 +674,7 @@ def test_tracers_coupling_rounding(self):
self.system.thermostat.set_lb(LB_fluid=lbf, seed=3, gamma=self.gamma)
rtol = self.rtol
if lbf.single_precision:
rtol *= 100.
rtol *= 200.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to increase this tolerance because the single-precision kernels are very sensitive to precision loss. In fact, if you regenerate the kernels enough times, you will eventually get a collide kernel where the order of operations is such that this test fails. I had to double the tolerance to avoid failures.

What is really peculiar is how with 2 MPI ranks or more, the initial LB populations are not bitwise identical in the 8 nodes sampled by the tracer. I would have expected them to be the same, since we don't use a thermalized kernel. I invested 2 days on this oddity and still don't understand the discrepancy. This oddity was not introduced by this PR, it's been here for a while.

Comment on lines +523 to +525
void PackInfoPdfDoublePrecision::pack(Direction dir, unsigned char *byte_buffer, IBlock *block) const {
byte_buffer += sizeof(double) - (reinterpret_cast<std::size_t>(byte_buffer) - (reinterpret_cast<std::size_t>(byte_buffer) / sizeof(double)) * sizeof(double));
double *buffer = reinterpret_cast<double *>(byte_buffer);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, the byte_buffer is not always aligned properly. I don't understand why, and the MPI code in waLBerla is really complex to analyze. I couldn't find where the memory allocation took place. You can investigate this issue in GDB by removing lines 524 and 899, and breaking on symbol __ubsan::Diag::~Diag. When we enter this function, the buffer may already contain some information, such as the field dimensions, but those values are 8 bytes long (int64_t). This cannot explain why we sometimes enter this function with 6 bytes already allocated, unless some function pushes 6 char flags to the buffer? This bug doesn't manifest itself with the waLBerla default PackInfo class. The full UBSAN report is reproduced below.

With this manual bugfix, the code passes on both AMD and Intel chips. One should be able to re-apply it with git cherry-pick 960c2ade when regenerating the kernels. I'll try to come up with a codegen MWE and submit a bug report on the waLBerla issue tracker.

UBSAN report
114/130 Test #103: LBWalberlaImpl_unit_tests ................***Failed    2.73 sec
Running 29 test cases...
Running 29 test cases...
pos [5.4, -0.6, -0.6], node [4, -2, -2], weight 0.001
pos [-0.6, -0.6, -0.6], node [-2, -2, -2], weight 0.001
pos [5.4, -0.6, -0.6], node [4, -2, -2], weight 0.001
pos [-0.6, -0.6, -0.6], node [-2, -2, -2], weight 0.001
pos [5.4, -0.6, -0.6], node [4, -2, -2], weight 0.001
pos [-0.6, -0.6, -0.6], node [-2, -2, -2], weight 0.001
pos [5.4, -0.6, -0.6], node [4, -2, -2], weight 0.001
pos [-0.6, -0.6, -0.6], node [-2, -2, -2], weight 0.001
/builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:9: runtime error: store to misaligned address 0x59034fc9c776 for type 'double', which requires 8 byte alignment
/builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:9: runtime error: store to misaligned address 0x6352aa97ee66 for type 'double', which requires 8 byte alignment
0x59034fc9c776: note: pointer points here
0x6352aa97ee66: note: pointer points here
 03 00 00 00 00 00  a0 52 97 b7 cd 74 00 00  60 c7 c9 4f 03 59 00 00  60 c7 c9 4f 03 59 00 00  dd dd
             ^ 
 03 00 00 00 00 00  a0 52 17 f8 01 79 00 00  50 ee 97 aa 52 63 00 00  50 ee 97 aa 52 63 00 00  dd dd
             ^ 
    #0 0x74cdb93aeaa1 in walberla::pystencils::internal_pack_W::pack_W(double*, double*, long, long, long, long, long, long, long) /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:102
    #1 0x74cdb93aeaa1 in walberla::pystencils::PackInfoPdfDoublePrecision::pack(walberla::stencil::Direction, unsigned char*, walberla::domain_decomposition::IBlock*) const /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:586:5
    #2 0x59034e6f447b in walberla::communication::UniformPackInfo::packData(walberla::domain_decomposition::IBlock const*, walberla::stencil::Direction, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /builds/espressomd/espresso/build/_deps/walberla-src/src/communication/UniformPackInfo.h:168:4
    #3 0x59034e716b4c in std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>::operator()(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9
    #4 0x59034e716b4c in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::send(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&, std::vector<std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>, std::allocator<std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>>>&) /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:481:7
    #5 0x59034e702fd5 in std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>::operator()(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9
    #6 0x59034e702fd5 in walberla::mpi::GenericOpenMPBufferSystem<walberla::mpi::GenericRecvBuffer<unsigned char>, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>>::startCommunicationOpenMP() /builds/espressomd/espresso/build/_deps/walberla-src/src/core/mpi/OpenMPBufferSystem.impl.h:137:7
    #7 0x59034e714e05 in walberla::mpi::GenericOpenMPBufferSystem<walberla::mpi::GenericRecvBuffer<unsigned char>, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>>::startCommunication() /builds/espressomd/espresso/build/_deps/walberla-src/src/core/mpi/OpenMPBufferSystem.impl.h:96:7
    #8 0x59034e714e05 in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::startCommunication() /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:379:18
    #9 0x59034e6e12b3 in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::communicate() /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:224:4
    #10 0x59034e6e12b3 in walberla::LBWalberlaImpl<double, (lbmpy::Arch)0>::integrate_push_scheme() /builds/espressomd/espresso/src/walberla_bridge/tests/../src/lattice_boltzmann/LBWalberlaImpl.hpp:581:35
    #11 0x59034e6ab912 in walberla::LBWalberlaImpl<double, (lbmpy::Arch)0>::integrate() /builds/espressomd/espresso/src/walberla_bridge/tests/../src/lattice_boltzmann/LBWalberlaImpl.hpp:632:7
    #12 0x59034e7d9391 in void forces_book_keepingcase::_impl<std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)>>(std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const&) /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:489:9
    #13 0x59034e7d864c in void forces_book_keepingcase::test_method<std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const&>(std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const& const&) /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:466:1
    #14 0x74cdb9c5dc15  (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x28c15) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #15 0x74cdb9c63cd4 in boost::execution_monitor::catch_signals(boost::function<int ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2ecd4) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #16 0x74cdb9c6414e in boost::execution_monitor::execute(boost::function<int ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2f14e) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #17 0x74cdb9c64237 in boost::execution_monitor::vexecute(boost::function<void ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2f237) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #18 0x74cdb9c81e4e in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::function<void ()> const&, unsigned long) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x4ce4e) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #19 0x74cdb9ca59c9  (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x709c9) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #20 0x74cdb9ca5d7a  (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x70d7a) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #21 0x74cdb9ca5d7a  (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x70d7a) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #22 0x74cdb9c6e768 in boost::unit_test::framework::run(unsigned long, bool) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x39768) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #23 0x74cdb9c81783 in boost::unit_test::unit_test_main(bool (*)(), int, char**) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x4c783) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #24 0x59034e69f9a9 in main /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:636:20
    #25 0x74cdb779b1c9  (/usr/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 4d9090d61bf70e6b3225d583f0f08193f54670b2)
    #26 0x74cdb779b28a in __libc_start_main (/usr/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 4d9090d61bf70e6b3225d583f0f08193f54670b2)
    #27 0x59034e670d34 in _start (/builds/espressomd/espresso/build/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests+0x10ed34) (BuildId: 33de3e955fa8eda7e1f426d1400f93adff5ef8a4)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:9 
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
    #0 0x7901f99aeaa1 in walberla::pystencils::internal_pack_W::pack_W(double*, double*, long, long, long, long, long, long, long) /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:102
    #1 0x7901f99aeaa1 in walberla::pystencils::PackInfoPdfDoublePrecision::pack(walberla::stencil::Direction, unsigned char*, walberla::domain_decomposition::IBlock*) const /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:586:5
    #2 0x6352a84e647b in walberla::communication::UniformPackInfo::packData(walberla::domain_decomposition::IBlock const*, walberla::stencil::Direction, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /builds/espressomd/espresso/build/_deps/walberla-src/src/communication/UniformPackInfo.h:168:4
    #3 0x6352a8508b4c in std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>::operator()(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9
    #4 0x6352a8508b4c in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::send(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&, std::vector<std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>, std::allocator<std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>>>&) /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:481:7
    #5 0x6352a84f4fd5 in std::function<void (walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&)>::operator()(walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9
    #6 0x6352a84f4fd5 in walberla::mpi::GenericOpenMPBufferSystem<walberla::mpi::GenericRecvBuffer<unsigned char>, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>>::startCommunicationOpenMP() /builds/espressomd/espresso/build/_deps/walberla-src/src/core/mpi/OpenMPBufferSystem.impl.h:137:7
    #7 0x6352a8506e05 in walberla::mpi::GenericOpenMPBufferSystem<walberla::mpi::GenericRecvBuffer<unsigned char>, walberla::mpi::GenericSendBuffer<unsigned char, walberla::mpi::OptimalGrowth>>::startCommunication() /builds/espressomd/espresso/build/_deps/walberla-src/src/core/mpi/OpenMPBufferSystem.impl.h:96:7
    #8 0x6352a8506e05 in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::startCommunication() /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:379:18
    #9 0x6352a84d32b3 in walberla::blockforest::communication::UniformBufferedScheme<walberla::stencil::internal::D3Q19<int>>::communicate() /builds/espressomd/espresso/build/_deps/walberla-src/src/blockforest/communication/UniformBufferedScheme.h:224:4
    #10 0x6352a84d32b3 in walberla::LBWalberlaImpl<double, (lbmpy::Arch)0>::integrate_push_scheme() /builds/espressomd/espresso/src/walberla_bridge/tests/../src/lattice_boltzmann/LBWalberlaImpl.hpp:581:35
    #11 0x6352a849d912 in walberla::LBWalberlaImpl<double, (lbmpy::Arch)0>::integrate() /builds/espressomd/espresso/src/walberla_bridge/tests/../src/lattice_boltzmann/LBWalberlaImpl.hpp:632:7
    #12 0x6352a85cb391 in void forces_book_keepingcase::_impl<std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)>>(std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const&) /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:489:9
    #13 0x6352a85ca64c in void forces_book_keepingcase::test_method<std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const&>(std::function<std::shared_ptr<LBWalberlaBase> (LBTestParameters const&)> const& const&) /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:466:1
    #14 0x7901fa37ec15  (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x28c15) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #15 0x7901fa384cd4 in boost::execution_monitor::catch_signals(boost::function<int ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2ecd4) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #16 0x7901fa38514e in boost::execution_monitor::execute(boost::function<int ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2f14e) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #17 0x7901fa385237 in boost::execution_monitor::vexecute(boost::function<void ()> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x2f237) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #18 0x7901fa3a2e4e in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::function<void ()> const&, unsigned long) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x4ce4e) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #19 0x7901fa3c69c9  (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x709c9) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #20 0x7901fa3c6d7a  (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x70d7a) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #21 0x7901fa3c6d7a  (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x70d7a) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #22 0x7901fa38f768 in boost::unit_test::framework::run(unsigned long, bool) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x39768) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #23 0x7901fa3a2783 in boost::unit_test::unit_test_main(bool (*)(), int, char**) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.83.0+0x4c783) (BuildId: 2c61160182c484a21dab791006ebbb07ebbfff3e)
    #24 0x6352a84919a9 in main /builds/espressomd/espresso/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests.cpp:636:20
    #25 0x7901f7f9b1c9  (/usr/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 4d9090d61bf70e6b3225d583f0f08193f54670b2)
    #26 0x7901f7f9b28a in __libc_start_main (/usr/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 4d9090d61bf70e6b3225d583f0f08193f54670b2)
    #27 0x6352a8462d34 in _start (/builds/espressomd/espresso/build/src/walberla_bridge/tests/LBWalberlaImpl_unit_tests+0x10ed34) (BuildId: 33de3e955fa8eda7e1f426d1400f93adff5ef8a4)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /builds/espressomd/espresso/src/walberla_bridge/src/lattice_boltzmann/generated_kernels/PackInfoPdfDoublePrecision.cpp:70:9 
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
  Process name: [[16478,1],1]
  Exit code:    1
--------------------------------------------------------------------------

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the waLBerla codegen tutorial and doing minor tweaks to generate sweeps for 3D fields, I am unable to reproduce this bytes buffer issue. I also still don't know which type of information would require only 1 byte. Most of the data we use are 4 bytes (int32_t, float) or 8 bytes (int64_t, double). I'll need more time to get to the bottom of this.

@@ -53,14 +59,17 @@ template <typename FT = double, Arch AT = Arch::CPU> struct KernelTrait {
pystencils::CollideSweepDoublePrecisionThermalizedAVX;
using CollisionModelLeesEdwards =
pystencils::CollideSweepDoublePrecisionLeesEdwardsAVX;
using StreamSweep = pystencils::StreamSweepDoublePrecisionAVX;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AVX stream sweeps only improve performance in single-precision. I'm including the double-precision kernel to be consistent.

Comment on lines +440 to +444
if (m_has_boundaries or (m_collision_model and has_lees_edwards_bc())) {
setup.template operator()<PackInfo<PdfField>>();
} else {
setup.template operator()<PackInfoStreaming<PdfField>>();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code-generated PackInfo do not work with UBB. After two days of investigation, I still don't understand why there is extra fluid densities flowing into LB fluid cells that are in contact with LB flag cells that have a slip velocity. You can replace PackInfo by PackInfoStreaming at line 441 and run ./pypresso ../testsuite/python/lb_boundary_velocity.py LBBoundaryVelocityTest.test_wall_slip_parallel to reproduce the bug. Edit the collide and stream kernels to print the values streamed along directions 5 and 6 (top resp. bottom) in the first fluid cell in contact with a flag cell. Use a box size of [8, 1, 1] in units of agrid for simplicity.

@jngrad jngrad added waLBerla Issues regarding waLBerla integration Performance labels Sep 20, 2024
@jngrad jngrad added the automerge Merge with kodiak label Sep 27, 2024
@kodiakhq kodiakhq bot merged commit 03033c4 into espressomd:python Sep 27, 2024
10 checks passed
@jngrad jngrad deleted the packinfo branch September 27, 2024 15:05
@RudolfWeeber
Copy link
Contributor

On my system, the LB benchmark with 1k particles/core is now 30% faster for 1 and 8 cores and for GPU. That's a significant imporement!

For grids below ~40 cubed, the CPU is version is now close to the GPu one (30% slower, or so). At some point, we might look into that some more. Long term, w emight

  • split the particle coupling to hide the laency of getting the interpolated velocities form the GPU
  • and/or move the entire particle coupling to the GPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge with kodiak Performance waLBerla Issues regarding waLBerla integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LB: avoid double communicaiton
2 participants