Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear data script #442

Merged
merged 3 commits into from
Oct 8, 2024
Merged

Clear data script #442

merged 3 commits into from
Oct 8, 2024

Conversation

jlashner
Copy link
Collaborator

This PR implements a clear-data script to scan directories and delete data from smurf server according to the data deletion proposal.

It will scan directories from which we want to delete data, and print directories to be deleted before asking the user if it should continue. For testing, it can be run in "dry" mode to see which removal commands will be run, like:

python3 clear_old_data.py --dry

To actually remove the files, run without the --dry argument.

This has been tested on smurf-so3 and smurf-srv20.

@jlashner jlashner requested a review from msilvafe September 24, 2024 16:49
Copy link
Contributor

@msilvafe msilvafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jack, very basic doc or one main docstring explaining yaml vs cli usage and a few inline comments/questions but I've approved.

Comment on lines 16 to 17
delete_core_dumps_after_days: int = 365
delete_logs_after_days: int = 365
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to set these to infinity (or like 5 years) or back them up somewhere until we resolve more of the frequent daq-discussion issues.

delete_logs_after_days: int = 365
verbose: bool = True

@classmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give an example cfg file and usage from cli w/ argparse in some docs.

)
for f in os.listdir(log_dir):
path = os.path.join(log_dir, f)
ts = os.path.getmtime(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using getmtime here and getctime in the core dumps?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not a huge difference, but I chose the mod-time here because the log-path is a directory, and I wanted the timestamp to be the time at which the last file was created or modified, instead of the creation time of the directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh gotcha, makes sense.

print(f"{len(files_to_delete)} files to delete:")
for f in files_to_delete:
days_old = (now - f.dt).days
print(f' - {f.path} ({days_old} days old)')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log these to watch in Loki?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to add loki support... it adds a requirement of network connectivity and making sure the hostnames are set correctly that I think would be a pain to deal with across servers. How about if we allow logging to a file in addition to stdout?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup to file sounds good! Would just like to find a record somewhere of what we deleted when.

@jlashner
Copy link
Collaborator Author

jlashner commented Oct 8, 2024

Thanks for the review @msilvafe! I made the requested changes and tested.

@jlashner jlashner merged commit 911a1d9 into master Oct 8, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants