-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #36729 - Implement pruning outdated TFTP files #873
base: develop
Are you sure you want to change the base?
Conversation
This does deserve tests. |
This presents on the surface as self-healing activity the smart-proxy could just do either triggered via a cron/systemd timer or through some internal process on the smart-proxy host. Is that approach a possibility? |
This is about replacing foreman_maintain's check_tftp_storage[1] with a native implementation. It is implemented by a /tftp/prune API endpoint which accepts the duration as a parameter. There is no plugin capability defined since Foreman can just call the endpoint and process HTTP 404 as Not Implemented. [1]: https://github.com/theforeman/foreman_maintain/blob/1e20bbd9f32a4f47193c4833fbd4012772b34dc4/definitions/checks/foreman_proxy/check_tftp_storage.rb
1107720
to
e19f7f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on this I did realize a potential flaw in foreman_maintain's logic: one host can download the TFTP files. The token can then expire, but another host may still use the same files.
After some digging I found that we define the prefix here:
https://github.com/theforeman/foreman/blob/b9986c9648b55c0d8fae4ab5ff5845763ddbea1f/app/models/operatingsystem.rb#L189-L196
So it's shared on the unique ID of the medium provider.
We do appear to always call curl (via HttpDownload) so perhaps that touches the file. If not, implementing that would keep it "fresh".
This presents on the surface as self-healing activity the smart-proxy could just do either triggered via a cron/systemd timer or through some internal process on the smart-proxy host. Is that approach a possibility?
theforeman/foreman#9825 implements a rake task. Completely untested now, but it shows the end-to-end idea. We can have a discussion there about other implementations (API, UI action, tasks, etc).
What is |
Thanks, the setting name is rather generic sounding for such a specific use case. I am guessing cause this dates back a loooong time. |
A quick look tells me theforeman/foreman@81159d4 introduced the setting. 2012 is indeed a loooong time ago. |
👍 With tests, I'd be glad to see this merged and move forward with the Foreman part. |
While looking at this further I noticed that it does keep the mtime from the original mirror, not the time of writing. So in practice this will always remove all files. Not just the ones that expired. That means that you can also easily purge all files that are also in use. That really makes me question the foreman-maintain version. While the system can recover (move a host out of build mode and back in build mode), it can be frustrating for users. This makes me question what makes sense here. Should Foreman perhaps calculate a list of acceptable filenames (IIRC it can do so) and any other file is removed. |
def self.outdated_files(age) | ||
delete_before = Time.now - age | ||
Dir.glob(File.join(Proxy::TFTP::Plugin.settings.tftproot, "boot", "*-{vmlinuz,initrd.img}")).filter do |file| | ||
File.file?(file) && File.mtime(file) < delete_before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could ctime
be used? Assuming the file is written once and then never changed, would ctime ever change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but then you still have the problem where:
- Host X provisions RHEL 9
- Host X's token expires
- Host Y enters build mode for RHEL 9
- Pruning happens
- Host Y attempts to retrieve TFTP files
Note that's also a problem with the current foreman_maintain implementation. Perhaps few users hit this because they may use external Smart Proxies and never run the code on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that Host Y enters build mode for RHEL 9
does a check if the files already exist on the smart-proxy, and if they do no operation is performed. And in theory, we would need to "touch" the files somehow to indicate. Which I think then is where you were getting at Foreman (or maybe the smart-proxy object?) having a canonical list of needed files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a host enters build mode (which can also be rebuilding an existing host), it immediately retrieves the files. Then the host still needs to boot. A BIOS (or UEFI) can take a significant amount of time on server class hardware (a minute is often on the low end). So if the pruning happens between those 2 events (Smart Proxy fetches TFTP files and host fetches TFTP files from Smart Proxy) you have a problem.
I think this requires a bit more thought on what the best way is, but bringing it back to the original issue: I'd recommend removing this from a health check and instead make it a separate step.
This is about replacing foreman_maintain's check_tftp_storage with a native implementation. It is implemented by a /tftp/prune API endpoint which accepts the duration as a parameter.
There is no plugin capability defined since Foreman can just call the endpoint and process HTTP 404 as Not Implemented.
A follow up in Foreman would implement a way to call this endpoint for all Smart Proxies with the TFTP feature present. Foreman does not depend on foreman-tasks natively, but it could be implemented in some cron-style way to make the system handle it automatically instead of putting the burden on the admin.