You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When run on a single node, allocated by slurm, I can trigger an OOM while writing content to files, and dfilemaker is killed. I'm not sure if this is flaw in dfilemaker or a flaw in something else (e.g. slurm config).
Below was on mutt, node allocated with "salloc -N1" and the target file system was Lustre
bash-4.4$ srun -n32 ~/projects/mfu-install/bin/dfilemaker --fill=alternate --depth=1-30 -nitems=10000-$((10*1000*1000)) --verbose
[2024-11-25T17:04:49] Creating 1429103 directories
[2024-11-25T17:04:59] Created 144290 directories (10%) in 10.056 secs (14348.591 dirs/sec) 90 secs left ...
[2024-11-25T17:05:09] Created 293797 directories (21%) in 20.114 secs (14606.766 dirs/sec) 78 secs left ...
...
[2024-11-25T17:07:41] Created 1278791 items (89%) in 70.013 secs (18264.982 items/sec) 8 secs left ...
[2024-11-25T17:07:51] Created 1425534 items (100%) in 80.010 secs (17817.046 items/sec) 0 secs left ...
[2024-11-25T17:07:52] Created 1429953 items (100%) in 81.207 secs (17608.759 items/sec) done
[2024-11-25T17:07:52] Writing content to files.
slurmstepd: error: Detected 1 oom_kill event in StepId=60053.4. Some of the step tasks have been OOM Killed.
srun: error: mutt11: task 10: Out Of Memory
srun: First task exited 30s ago
srun: StepId=60053.4 tasks 0-9,11-24,26-31: running
srun: StepId=60053.4 tasks 10,25: exited abnormally
srun: Terminating StepId=60053.4
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
slurmstepd: error: *** STEP 60053.4 ON mutt11 CANCELLED AT 2024-11-25T17:12:22 ***
The text was updated successfully, but these errors were encountered:
When run on a single node, allocated by slurm, I can trigger an OOM while writing content to files, and dfilemaker is killed. I'm not sure if this is flaw in dfilemaker or a flaw in something else (e.g. slurm config).
Below was on mutt, node allocated with "salloc -N1" and the target file system was Lustre
mpifileutils version was this:
and command run was this
The text was updated successfully, but these errors were encountered: