Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I/O Increases drastically in long simulations #2878

Open
bshanahan opened this issue Mar 7, 2024 · 11 comments
Open

I/O Increases drastically in long simulations #2878

bshanahan opened this issue Mar 7, 2024 · 11 comments

Comments

@bshanahan
Copy link
Contributor

I've been running the 2D drift-plane turbulence example in Hermes-3 for many timesteps, and I/O is responsible for about 70% of the computational time per timestep by the 3000th output (compared to 1-2% in the initial timesteps, which have fewer RHS evals.).

@bendudson
Copy link
Contributor

Interesting! @pnorbert may have some insight here. Could you please post some lines from the log file, near the start and near the end?

In the next branch there is also an ADIOS2 output option: Compiling with -DBOUT_DOWNLOAD_ADIOS=ON and setting output:type=adios should save an ADIOS2 dataset. See https://bout-dev.readthedocs.io/en/latest/user_docs/adios2.html
It would be good to know if this suffered from the same problem, or if it's a NetCDF thing.

@bshanahan
Copy link
Contributor Author

Here are some lines early on in the simulation:

2.500e+04       1289       8.91e-01    67.3    7.1    4.4    3.8   17.4
2.510e+04       1130       7.86e-01    66.9    7.0    4.5    4.3   17.4
2.520e+04       1105       7.72e-01    66.8    7.0    4.4    4.4   17.4
2.530e+04       1027       7.27e-01    65.7    7.1    4.9    4.7   17.7
2.540e+04       1021       7.19e-01    66.0    7.0    4.7    4.7   17.6
2.550e+04       1149       7.99e-01    66.9    7.0    4.6    4.3   17.2

..and some at the end:

3.996e+05       1165       2.78e+00    20.0    2.3    1.8   68.3    7.6
3.997e+05       1015       2.62e+00    18.2    2.0    1.5   71.6    6.7
3.998e+05       1192       2.75e+00    20.2    2.3    1.7   68.9    7.0
3.999e+05       1058       2.63e+00    18.7    2.1    1.5   71.4    6.4
4.000e+05       1157       2.71e+00    20.0    2.2    1.5   69.2    7.2

I/O is the second-to-last column.

@ZedThree
Copy link
Member

If you restart the simulation, does the IO time "reset"?

@dschwoerer
Copy link
Contributor

dschwoerer commented Apr 10, 2024 via email

@ZedThree
Copy link
Member

Ok, that's a little reassuring I guess. Next question is does this appear in non-hermes-3 models? I can probably check this with something like blob2d or conduction, as they're pretty fast to run.

@dschwoerer
Copy link
Contributor

I just noticed in a hermes-2 simulation that it got OOM killed after 15 time steps. This is probably indirectly related to #2900 as it increases the memory usage, but that also seems to imply that the memory consumption (slightly) increases over time.

@ZedThree
Copy link
Member

This can be seen even in conduction over 500 timesteps:

image

This is absolute timings of I/O (in seconds) and the problem is present in both master and next.

@ZedThree
Copy link
Member

I haven't managed to work out if there's a memory leak, but the increase in IO time does appear to be mostly coming from the following lines:

bout::utils::visit(NcPutVarVisitor(var), child.value);

bout::utils::visit(NcPutVarCountVisitor(var, start_index, count_index),

The top line is for variables without a time dimension, and the bottom one is for those with one. It turns out we are writing a lot of time-independent variables on every timestep, and this both takes a decent fraction of the total IO time and increases in cost over the course of the run.

I'll trying switching to ADIOS tomorrow and see if that makes a difference.

The reason we're writing all these time-independent things every time step is because we reuse PhysicsModel::output_options for the whole run. We could maybe do something like clear that variable after the initial write?

@ZedThree
Copy link
Member

Well, here's an easy speed up:

modified   src/physics/physicsmodel.cxx
@@ -253,6 +253,7 @@ int PhysicsModel::PhysicsModelMonitor::call(Solver* solver, BoutReal simtime,
   solver->outputVars(model->output_options, true);
   model->outputVars(model->output_options);
   model->writeOutputFile();
+  model->output_options = Options{};
 
   // Call user output monitor
   return model->outputMonitor(simtime, iteration, nout);

This halves the IO time for conduction but doesn't get rid of the slow down over time. It's plausible there's an issue with netCDF here.

@ZedThree
Copy link
Member

In the next branch there is also an ADIOS2 output option: Compiling with -DBOUT_DOWNLOAD_ADIOS=ON and setting output:type=adios should save an ADIOS2 dataset. See https://bout-dev.readthedocs.io/en/latest/user_docs/adios2.html
It would be good to know if this suffered from the same problem, or if it's a NetCDF thing.

I've checked this now, and it is just netCDF -- ADIOS2 is about a factor 10 faster for the conduction example in fact.

All my profiling points towards putVar, or at least a very small region around calls to that, but I can't reproduce the slowdown outside of BOUT++, and I can't see how any of our code would be causing this issue.

@ZedThree
Copy link
Member

Ok, I finally have an MVCE in pure netcdf:

#include <array>
#include <cstddef>
#include <netcdf>
#include <chrono>
#include <format>
#include <vector>


using clock_type =
  typename std::conditional<std::chrono::high_resolution_clock::is_steady,
                            std::chrono::high_resolution_clock,
                            std::chrono::steady_clock>::type;
using seconds = std::chrono::duration<double, std::chrono::seconds::period>;

using namespace netCDF;

constexpr auto loops = 1000;
constexpr auto repeats = 10;

constexpr std::size_t nx = 5;
constexpr std::size_t ny = 5;

constexpr auto filename = "test.nc";

int main() {

  NcFile file{filename, NcFile::replace};

  const std::array<int, nx * ny> arr {
    nx, nx, nx, nx, nx,
    nx, nx, nx, nx, nx,
    nx, nx, nx, nx, nx,
    nx, nx, nx, nx, nx,
    nx, nx, nx, nx, nx
  };

  auto time_dim = file.addDim("t");
  auto x_dim = file.addDim("x", nx);
  auto y_dim = file.addDim("y", ny);
  NcVar arr_var = file.addVar("arr_var", ncInt, {time_dim, x_dim, y_dim});

  std::size_t time = 0;

  for (auto loop = 0; loop < loops; loop++) {
    const std::vector<std::size_t> start {time++};
    const auto started = clock_type::now();

    for (auto repeat = 0; repeat < repeats; repeat++) {
      arr_var.putVar(start, {1, nx, ny}, arr.data());
      file.sync();
    }

    // file.sync();

    const auto finished = clock_type::now();
    const auto time = finished - started;

    std::cout << std::format("{:3}: {:8.6e}", loop, seconds{time}.count()) << "\n";
  }

  return 0;
}

image

(y axis in seconds here, x axis is outer loop iteration)

Moving the file.sync() out of the inner loop massively reduces the slowdown:

image

Much faster overall, although the slowdown is still present when run over a longer period, but it does appear to saturate (figure here is averaged over a 100-step rolling window):

image

The improved overall performance makes sense: we're hitting disk less often. I still don't understand the cause of the slowdown, but it's apparent it's inside netCDF.


The solution for BOUT++ then is maybe to reintroduce the periodic-flush argument, instead of flushing every single timestep. This increases the risk of data loss, but should improve performance. We could also try periodically closing and re-opening the file, but this does attract its own overheads.

Alternatively, switching to ADIOS2 also gives significant performance improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants