Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs truncate to quickly erase the entire contents of a dataset #16765

Open
Haravikk opened this issue Nov 15, 2024 · 2 comments
Open

zfs truncate to quickly erase the entire contents of a dataset #16765

Haravikk opened this issue Nov 15, 2024 · 2 comments
Labels
Type: Feature Feature request or new feature

Comments

@Haravikk
Copy link

Haravikk commented Nov 15, 2024

Describe the feature would like to see added to OpenZFS

A new command zfs truncate [-rn] <dataset>... would remove all data from one or more dataset(s) without destroying them, ideally doing so as efficiently as possible compared to deleting contents at the file-system level (via rm -R or similar). It would also leave all bookmarks and snapshots intact, so taking a snapshot to rollback to is an option.

The optional -r flag will allow this to be done recursively to child datasets, and -n would allow for a dry run (to confirm which dataset(s) will be truncated, without actually doing so).

Due to the complexities of an in-use file system, it may be necessary to make this operation one that cannot be performed on a mounted filesystem – in this case the -u flag should be implemented as well to allow the user to specify whether a mounted dataset (which will need to be unmounted first) should be mounted again once the operation is complete.

Permission for this command should either be the same as for zfs destroy, or a new permission "under" that (for potential use cases where a user might need to be able to self-clear a dataset without destroying it).

How will this feature improve OpenZFS?

It will make it easier to clear a dataset for reuse, or to simply empty a dataset without destroying it, especially in cases where destroying and recreating the dataset is either undesirable, or not possible (it is a parent dataset).

For example, I used to use a setup like the following for a user folder:

zdata/haravikk
zdata/haravikk/Movies
zdata/haravikk/Music

Although I am still using the Movies and Music datasets, I am no longer using the parent dataset to store the user folder, and no longer mounting it at all. This means that the contents of zdata/haravikk are no longer required, but it is still needed as a parent for organisational purposes, and to act as an encryption root.

To clear it however I had to mess around with mount points so I could mount it somewhere temporarily (while leaving the child datasets mounted where they were) and essentially run rm -R to delete its entire contents. Not the end of the world, but it made this operation more complicated, and I'm not convinced it has cleared the dataset properly, as it's still reporting around 150mb in references, despite there being no files or folders (and none hidden) within the file system.

Another useful use-case would be for datasets that are used for volatile data such as caches, as it will make these easier to clear fully without having to destroy and recreate the dataset(s) along with their properties etc.

@Haravikk Haravikk added the Type: Feature Feature request or new feature label Nov 15, 2024
@IvanVolosyuk
Copy link

That's a very destructive command proposal with the name resembling the systemd's fiasco 'systemd-tmpfiles --purge'

@Haravikk
Copy link
Author

Haravikk commented Nov 17, 2024

The naming is 100% optional, truncate seemed the most logical to me but I might be biased as someone who works with SQL databases a lot – empty would also be suitable perhaps? I originally considered clear but that would be confusing with zpool clear having a very different meaning (clears/resets errors).

As for being destructive, it's no more destructive than zfs destroy, except this wouldn't discard the actual dataset and its properties, would leave bookmarks/snapshots intact and only gets rid of data records for the live dataset. I assume it would be given a similar permission level to destroy. I've added a note to that effect.

It's really just the same as doing rm -rf /path/to/mounted/dataset but ideally cleaner (no chance of leftover references), and without requiring it to be mounted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

2 participants