-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear data script #442
Clear data script #442
Conversation
2ff8199
to
27012fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jack, very basic doc or one main docstring explaining yaml vs cli usage and a few inline comments/questions but I've approved.
scripts/clear_old_data.py
Outdated
delete_core_dumps_after_days: int = 365 | ||
delete_logs_after_days: int = 365 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to set these to infinity (or like 5 years) or back them up somewhere until we resolve more of the frequent daq-discussion issues.
delete_logs_after_days: int = 365 | ||
verbose: bool = True | ||
|
||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give an example cfg file and usage from cli w/ argparse in some docs.
) | ||
for f in os.listdir(log_dir): | ||
path = os.path.join(log_dir, f) | ||
ts = os.path.getmtime(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using getmtime
here and getctime
in the core dumps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not a huge difference, but I chose the mod-time here because the log-path is a directory, and I wanted the timestamp to be the time at which the last file was created or modified, instead of the creation time of the directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh gotcha, makes sense.
scripts/clear_old_data.py
Outdated
print(f"{len(files_to_delete)} files to delete:") | ||
for f in files_to_delete: | ||
days_old = (now - f.dt).days | ||
print(f' - {f.path} ({days_old} days old)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we log these to watch in Loki?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really want to add loki support... it adds a requirement of network connectivity and making sure the hostnames are set correctly that I think would be a pain to deal with across servers. How about if we allow logging to a file in addition to stdout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup to file sounds good! Would just like to find a record somewhere of what we deleted when.
Thanks for the review @msilvafe! I made the requested changes and tested. |
This PR implements a clear-data script to scan directories and delete data from smurf server according to the data deletion proposal.
It will scan directories from which we want to delete data, and print directories to be deleted before asking the user if it should continue. For testing, it can be run in "dry" mode to see which removal commands will be run, like:
To actually remove the files, run without the
--dry
argument.This has been tested on smurf-so3 and smurf-srv20.