You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we do not have any precise procedure or tooling around cleanup of ZIMs.
There are many topics that should be considered:
we sometimes need to remove files from production (ZIM no longer allowed to be published, reorganization of ZIM names / splitting of content around ZIM files, Zimfarm configuration error, ...):
these files should probably be moved to a temporary trash and kept there for few days before final deletion (errors arrives also when deleting content)
since few weeks I've beginning to move them to .hidden/to_delete with one folder per month
but there also we will probably have to delete some app which are not published anymore / renamed
we build a lot of ZIMs to .hidden/dev and most of them have no reason to be kept on the long term
but we want to keep some of them (i.e. we cannot say "delete everything which is more than 1 month old")
we build files for some projects (bsf, endless)
but here again we are probably only interesting in keeping the last (or two lasts) version of each book
I would propose to :
make official the decision to never directly delete a ZIM from production but move it to a temporary trash
build a small tool which would:
contains rules about what has to be kept / deleted
every day:
list files in watched directories
mark files that should be deleted according to the rules
unmark files that should not be deleted anymore according to the rules (which have probably been updated)
delete files that have been marked for more than 7 days
report actions in a Slack channel
The idea of marking files comes from the fact that:
it seems preferable to process things "on-the-fly" (rather than doing it only once a month) to keep storage usage flat and avoid situation where it takes long to clean things
it is in any case needed to keep a list of things to cleanup processable by the machine (e.g. we cannot list files of 1st of the month and cleanup on 7th of the month without a list of things to cleanup, because otherwise some files might have appeared in the cleanup list in between and would be deleted if the machine does not know it wasn't there on 1st of the month)
It has some drawbacks:
we need to keep a list of marked files (but it is not very important data, we can rebuild it)
there will probably be a kind of "fatigue" with new files marked every day, and people will begin to pay less attention to it
Proposal of rules (in TOML because it is a config file format for humans and I expect to write the tool in Python which promotes TOML significantly, but in fact I don't really care)
[delete_rules.xxx]: this is the configuration of the deletion rule xxx (I imagine the tool will be able to do other stuff in the future)
folder: path to process for cleanup
delete_rule: how to decide what has to be cleaned
file_older_than_days: delete files older than a given amount of days
all_but_last_book: delete files which are not the last book version (based on ZIM naming convention)
last_folder_older_than_days: delete folders if they are older than a given amount of days AND the last folder in the tree (i.e. they do not contain another folder)
delete_threshold: the threshold for the deletion rule
force_delete: a list of file to force to delete
force_keep: a list of files to force to keep
I think that this tool will be used for other cleanup duties:
LGTM ; I can't find the other discussion but found this (dont look at the rest of the ticket) which is a bit similar. I find your approach better in several ways: commit to mark stuff we want to keep ~forever (so we'll get a commit message) and a short duration to deletion (otherwise there's the risk of postponing it then missing the deadline)
Currently, we do not have any precise procedure or tooling around cleanup of ZIMs.
There are many topics that should be considered:
.hidden/to_delete
with one folder per month.hidden/custom_apps
but we want to keep only the latest version of each ZIM (see Older zim files are not deleted in /custom_apps zimfarm#905).hidden/dev
and most of them have no reason to be kept on the long termI would propose to :
The idea of marking files comes from the fact that:
It has some drawbacks:
Proposal of rules (in TOML because it is a config file format for humans and I expect to write the tool in Python which promotes TOML significantly, but in fact I don't really care)
With the following meanings:
[delete_rules.xxx]
: this is the configuration of the deletion rulexxx
(I imagine the tool will be able to do other stuff in the future)folder
: path to process for cleanupdelete_rule
: how to decide what has to be cleanedfile_older_than_days
: delete files older than a given amount of daysall_but_last_book
: delete files which are not the last book version (based on ZIM naming convention)last_folder_older_than_days
: delete folders if they are older than a given amount of days AND the last folder in the tree (i.e. they do not contain another folder)delete_threshold
: the threshold for the deletion ruleforce_delete
: a list of file to force to deleteforce_keep
: a list of files to force to keepI think that this tool will be used for other cleanup duties:
trash_rules
to trash production ZIMsWDYT?
The text was updated successfully, but these errors were encountered: