Lustre DataIn Data Movements need a way to set striping #220

bdevcich · 2024-10-28T16:25:06Z

Rabbit lustre filesystems default to a default stripe count of 1. When creating lustre filesystems with multiple OSTs, DataIn Data Movements will only be able to transfer data up to the size of 1 OST. This will cause dcp to error out once it fills up:

Copied 16.462 TiB (59%!)(MISSING) in 2001.381 secs (8.423 GiB/s) 1403 secs left ...
ABORT: rank X on HOST: Failed to write file /mnt/nnf/ab40f16f-cc36-4f2d-9d78-0c53495bfb1b-0/testfile errno=28 (No space left on device) @ /deps/mpifileutils/src/common/mfu_io.c:1051

We need a way to set the stripe count appropriately prior to Data Movement performed in the DataIn state.

The text was updated successfully, but these errors were encountered:

bdevcich · 2024-10-28T17:07:35Z

Setting a stripe count of -1 prior to data movement in DataIn resolves the issue.

The initial plan is to make this part of the NnfDataMovementProfile to run a command/commands prior to data movement.
We need to determine if this is something we want to do prior for every data movement or just for DataIn. In DataOut situations, the compute application/script can essentially do whatever it needs to do on the target filesystem to prepare for DataOut, but there is no parallel for that on DataIn.

I think we also need to consider if doing something like setstripe -c -1 just to suite the needs of data movement in DataIn is going to cause any issues with the running compute node application if that has its own striping ideas.

Does this mean that users will want to use a different DM profile for DataIn? Or do we just limit these commands (if supplied) to only run on DataIn.

weloewe · 2024-10-28T20:46:05Z

Another consideration might be to use Progressive File Layout (PFL) to avoid the situation of filling a single OST. In that case, very small files might go to the MDT and then as file extents grew they would stripe across more and more OSTs:

lfs setstripe -E 64K -L mdt -E 16m -c 1 -S 16m -E 1G -c 2 -E 4G -c 4 -E 16G -c 8 -E 64G -c 16 -E -1 -c -1

In any case, the specifics of the striping could be determined as needed, but maintaining a default (PFL) to avoid filling an individual OST.

behlendorf · 2024-10-28T23:00:47Z

Setting a PFL layout here would be a nice way to handle this since then files created by a users of the filesystem would also benefit from reasonable striping defaults. We should be able to use the new postActivate functionality for this, or we could add a new dedicated field to the NnfStorageProfile.

bdevcich · 2024-10-29T14:10:52Z

Would this be something that is default for all rabbit lustre filesystems or do we need to do something different for DataIn situations?

bdevcich · 2024-10-29T15:12:49Z

We should be able to use the new postActivate functionality for this, or we could add a new dedicated field to the NnfStorageProfile.

@behlendorf Unfortunately not. postActivate commands are run on the lustre server side, so we're going to have to add a new field to the NnfStorageProfile and some new behavior to mount and run commands from the lustre client.

The postActivate field already does this for XFS/GFS, but not for lustre.

behlendorf · 2024-11-01T03:55:08Z

We'd likely want to set this for all Rabbit Lustre filesystems although perhaps slightly differently depending on how many rabbits are part of the filesystem. You're right, I forget this was a client side thing. It does seem like we'll need some more machinery for that.

bdevcich · 2024-11-05T14:12:45Z

PostMount and PreUnmount commands are being added to the NnfStorageProfiles to support this.

Additionally, for XFS and GFS2, the existing PreActivate and PostDeactivate commands will be renamed to PostMount and PreUnmount since those actions are already taking place on the client side.

For lustre, all 4 will be supported.

perhaps slightly differently depending on how many rabbits are part of the filesystem

Would something like $NUM_RABBITS in the commands work or does it need to be more dynamic than that?

behlendorf · 2024-11-07T19:39:41Z

Would something like $NUM_RABBITS in the commands work or does it need to be more dynamic than that?

We should also add $NUM_MDTS and $NUM_OSTS since they're relevant when deciding on a layout.

bdevcich · 2024-11-20T16:02:43Z

Addressed via NearNodeFlash/nnf-sos#416

We should also add $NUM_MDTS and $NUM_OSTS since they're relevant when deciding on a layout.

This will be addressed in a subsequent PR.

bdevcich · 2024-12-04T20:34:23Z

PR to add $NUM_MDTS and $NUM_OSTS here: NearNodeFlash/nnf-sos#424

bdevcich · 2024-12-09T19:36:50Z

This is complete. Docs here: https://nearnodeflash.github.io/dev/guides/storage-profiles/readme/#lustre-specific

github-project-automation bot added this to Issues Dashboard Oct 28, 2024

github-project-automation bot moved this to 📋 Open in Issues Dashboard Oct 28, 2024

bdevcich self-assigned this Nov 5, 2024

bdevcich moved this from 📋 Open to 🏗 In progress in Issues Dashboard Nov 5, 2024

bdevcich moved this from 🏗 In progress to 👀 In review in Issues Dashboard Dec 4, 2024

bdevcich closed this as completed Dec 9, 2024

github-project-automation bot moved this from 👀 In review to ✅ Closed in Issues Dashboard Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lustre DataIn Data Movements need a way to set striping #220

Lustre DataIn Data Movements need a way to set striping #220

bdevcich commented Oct 28, 2024

bdevcich commented Oct 28, 2024

weloewe commented Oct 28, 2024

behlendorf commented Oct 28, 2024

bdevcich commented Oct 29, 2024

bdevcich commented Oct 29, 2024

behlendorf commented Nov 1, 2024

bdevcich commented Nov 5, 2024

behlendorf commented Nov 7, 2024 •

edited

Loading

bdevcich commented Nov 20, 2024

bdevcich commented Dec 4, 2024

bdevcich commented Dec 9, 2024

Lustre DataIn Data Movements need a way to set striping #220

Lustre DataIn Data Movements need a way to set striping #220

Comments

bdevcich commented Oct 28, 2024

bdevcich commented Oct 28, 2024

weloewe commented Oct 28, 2024

behlendorf commented Oct 28, 2024

bdevcich commented Oct 29, 2024

bdevcich commented Oct 29, 2024

behlendorf commented Nov 1, 2024

bdevcich commented Nov 5, 2024

behlendorf commented Nov 7, 2024 • edited Loading

bdevcich commented Nov 20, 2024

bdevcich commented Dec 4, 2024

bdevcich commented Dec 9, 2024

behlendorf commented Nov 7, 2024 •

edited

Loading