-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I/O Increases drastically in long simulations #2878
Comments
Interesting! @pnorbert may have some insight here. Could you please post some lines from the log file, near the start and near the end? In the |
Here are some lines early on in the simulation:
..and some at the end:
I/O is the second-to-last column. |
If you restart the simulation, does the IO time "reset"? |
Yes, it does, as far as I remember. So it is no major issue, as
restarting resolves it.
|
Ok, that's a little reassuring I guess. Next question is does this appear in non-hermes-3 models? I can probably check this with something like blob2d or conduction, as they're pretty fast to run. |
I just noticed in a hermes-2 simulation that it got OOM killed after 15 time steps. This is probably indirectly related to #2900 as it increases the memory usage, but that also seems to imply that the memory consumption (slightly) increases over time. |
I haven't managed to work out if there's a memory leak, but the increase in IO time does appear to be mostly coming from the following lines: BOUT-dev/src/sys/options/options_netcdf.cxx Line 517 in 1d52b38
BOUT-dev/src/sys/options/options_netcdf.cxx Line 537 in 1d52b38
The top line is for variables without a time dimension, and the bottom one is for those with one. It turns out we are writing a lot of time-independent variables on every timestep, and this both takes a decent fraction of the total IO time and increases in cost over the course of the run. I'll trying switching to ADIOS tomorrow and see if that makes a difference. The reason we're writing all these time-independent things every time step is because we reuse |
Well, here's an easy speed up: modified src/physics/physicsmodel.cxx
@@ -253,6 +253,7 @@ int PhysicsModel::PhysicsModelMonitor::call(Solver* solver, BoutReal simtime,
solver->outputVars(model->output_options, true);
model->outputVars(model->output_options);
model->writeOutputFile();
+ model->output_options = Options{};
// Call user output monitor
return model->outputMonitor(simtime, iteration, nout); This halves the IO time for |
I've checked this now, and it is just netCDF -- ADIOS2 is about a factor 10 faster for the All my profiling points towards |
Ok, I finally have an MVCE in pure netcdf: #include <array>
#include <cstddef>
#include <netcdf>
#include <chrono>
#include <format>
#include <vector>
using clock_type =
typename std::conditional<std::chrono::high_resolution_clock::is_steady,
std::chrono::high_resolution_clock,
std::chrono::steady_clock>::type;
using seconds = std::chrono::duration<double, std::chrono::seconds::period>;
using namespace netCDF;
constexpr auto loops = 1000;
constexpr auto repeats = 10;
constexpr std::size_t nx = 5;
constexpr std::size_t ny = 5;
constexpr auto filename = "test.nc";
int main() {
NcFile file{filename, NcFile::replace};
const std::array<int, nx * ny> arr {
nx, nx, nx, nx, nx,
nx, nx, nx, nx, nx,
nx, nx, nx, nx, nx,
nx, nx, nx, nx, nx,
nx, nx, nx, nx, nx
};
auto time_dim = file.addDim("t");
auto x_dim = file.addDim("x", nx);
auto y_dim = file.addDim("y", ny);
NcVar arr_var = file.addVar("arr_var", ncInt, {time_dim, x_dim, y_dim});
std::size_t time = 0;
for (auto loop = 0; loop < loops; loop++) {
const std::vector<std::size_t> start {time++};
const auto started = clock_type::now();
for (auto repeat = 0; repeat < repeats; repeat++) {
arr_var.putVar(start, {1, nx, ny}, arr.data());
file.sync();
}
// file.sync();
const auto finished = clock_type::now();
const auto time = finished - started;
std::cout << std::format("{:3}: {:8.6e}", loop, seconds{time}.count()) << "\n";
}
return 0;
} (y axis in seconds here, x axis is outer loop iteration) Moving the Much faster overall, although the slowdown is still present when run over a longer period, but it does appear to saturate (figure here is averaged over a 100-step rolling window): The improved overall performance makes sense: we're hitting disk less often. I still don't understand the cause of the slowdown, but it's apparent it's inside netCDF. The solution for BOUT++ then is maybe to reintroduce the periodic-flush argument, instead of flushing every single timestep. This increases the risk of data loss, but should improve performance. We could also try periodically closing and re-opening the file, but this does attract its own overheads. Alternatively, switching to ADIOS2 also gives significant performance improvements. |
I've been running the 2D drift-plane turbulence example in Hermes-3 for many timesteps, and I/O is responsible for about 70% of the computational time per timestep by the 3000th output (compared to 1-2% in the initial timesteps, which have fewer RHS evals.).
The text was updated successfully, but these errors were encountered: