-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate checksum after successful import #627
base: main
Are you sure you want to change the base?
Conversation
adab6da
to
4b8a8a3
Compare
With the use of the web-download mechanism of Glance images are usually not downloaded by the image-manager but by Glance itself. I'd refrain from adding any code that would download images and requiring enough bandwidth or even disk space to hold them temporarily. @gndrmnn did you see my comment at #340 (comment) ? in short: Glance does already produce hash values, it's just that this is done with a single fixed hash algo for all images and it cannot be chosen per image. I suggest to try and extend the import call of the image API (https://docs.openstack.org/api-ref/image/v2/index.html?expanded=import-an-image-detail#import-an-image) to optionally give it a hash algo to use for the checksum. Then you get the image downloaded by Glance and the hash created by Glance and can simply compare it by fetching the image details. |
@frittentheke Hi,
As far as I understand, the
Yes, I read this. In fact, we compare the (in the future existing) SHA-512 contained in yaml files with the hash that glance computes when importing the image (i think that is how it works?). The
Yes, that would also be possible. It was discussed with @berendt that we simply recompute the hash values for all image yml files at "image update time". But let's keep the PR on hold until @berendt is back from vacation and gives some input. |
@frittentheke This was discussed with @berendt and it was decided that we implement this PR as is. The hash values in the yaml files will be pre-computed by the GitHub/Zuul Pipeline and not on the user side. The main advantage of this method is that we can implement it right now. However, your suggestion to enhance the upstream API can still be applied later as a separate feature. That would allow Glance to throw away any corrupt image before it even "reaches" the backend. I will open a separate issue for that. See #643 |
8717c5a
to
ff74ed5
Compare
I'm wondering whether the file should be downloaded AFTER the filename is determined (after the whole HEREBE DRAGONS stuff). Or am I missing something? Also, I would actually keep track of the original hash as well, so that the file need not be downloaded every time the Github action runs. Naturally, I'm volunteering to make the changes I propose, and also to rebase to the outcome of #647 :) |
If I understand you correctly, it should make no meaningful difference at which of those two points in time the file is downloaded and hashed.
Right, that - or some other mechanism - would make sense to safe some bandwidth. We will consider how to best implement that.
No, it's fine we got this. We might come back to ask some questions about #647 if need be later on though :) |
Maybe I got it wrong myself, but in my understanding, the filename is not even known in advance for certain distros. The |
Ah right, yes. That indeed needs to be taken care of. Thanks for pointing that out. |
2b561dd
to
513edbc
Compare
@mbuechse I would kindly ask you to review this Merge Request as you are most familiar with the refactored update.py :) The current iteration of this MR adds an additional field to the yaml schema which contains the sha512 of the actual image. Therefore, we can use the other existing hash to check if a new version was released. Also the filenames are resolved first now. I'd say it makes sense to review the commits individually. |
@gndrmnn I've been ill. I will certainly have a look one of the next few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By and large good, but with minor formatting issues. Also, I think the additional code path for hashing archives should be revisited if possible.
else: | ||
assert download_filename not in ["", ".", " ", "/", ".."] | ||
logger.info("Decompressing '%s'" % download_filename) | ||
patoolib.extract_archive(download_filename, outdir=".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a pity that we need a dedicated code path for the archives. There are several aspects to be considered
- It is probably "more correct" to hash the extracted file instead of the archive, even though the archive shouldn't change if the content doesn't. The question is if the this "added correctness" is worth the price of the additional code path.
- We could at least extract the file on-the-fly while downloading it; this is not too hard to achieve with bz2, gzip, and xz; for zip, this might be harder (maybe bordering on impossible). The question is whether zip is really needed here.
- It should be noted that
patool
is licensed under GPLv3, which might be a problem; see Check licenses of dependencies #663. If we extract on the fly as suggested above, then patool would be out of the picture anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It is probably "more correct" to hash the extracted file instead of the archive, even though the archive shouldn't change if the content doesn't. The question is if the this "added correctness" is worth the price of the additional code path.
We can only compare hashes that the glance backend provides us, which is afaik the extracted hash. So hashing the archive would be pointless
- We could at least extract the file on-the-fly while downloading it; this is not too hard to achieve with bz2, gzip, and xz; for zip, this might be harder (maybe bordering on impossible). The question is whether zip is really needed here.
Alright i will investigate that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. Of course you are right!
openstack_image_manager/update.py
Outdated
logger.info(f"Image {name} change detected. Downloading Image...") | ||
|
||
headers, verify_checksum, extracted_file = download_and_hash(current_url) | ||
if verify_checksum == None or extracted_file in ["", ".", " ", "/", ".."]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP-8: Do comparisons with None
using is
or is not
instead of ==
or !=
, respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, the second condition (extracted_file in ["", ".", " ", "/", ".."]
) is really smelly (there must be a more certain way to rule out failure), and I'm also pretty sure that it wouldn't occur in this way anyway, because download_and_hash
uses this name to open a file, and that should hopefully fail if the name were any of the things listed).
Use the 'verify_checksum' hash value in the yaml files to verify the image integrity after it has been successfully imported. Show a warning, if either the hash algorithm or the hash value does not match the expected fields. Fixes osism#340 Signed-off-by: Gondermann <[email protected]>
Signed-off-by: Gondermann <[email protected]>
Since we want to check the hash against the openstack backend, we need to have the SHA512. Sadly, most images creators do not provide us with that hash pre-computed. That means we will compute the SHA512 for every new image update in the CI worker by downloading the image. Signed-off-by: Gondermann <[email protected]>
Why on earth did the PEP8 issues not trigger the Flake8 test?? 🤔 |
Maybe your fork somehow didn't include the CI triggers? |
Nope, the flake8 test in osism/zuul-jobs did not have the Working on a fix fo osism/zuul-jobs now... |
recheck |
Use the new 'verify_checksum' hash value in the yaml files to verify the image integrity after it has been successfully imported. Show a warning, if either the hash algorithm or the hash value does not match the expected fields.
The SHA512 values used here are computed by the CI pipeline if and only if the image is updated.
Fixes #340