Validate checksum after successful import #627

gndrmnn · 2023-08-10T09:08:28Z

Use the new 'verify_checksum' hash value in the yaml files to verify the image integrity after it has been successfully imported. Show a warning, if either the hash algorithm or the hash value does not match the expected fields.

Deactivate broken image / deactivate cycle process
unit test

The SHA512 values used here are computed by the CI pipeline if and only if the image is updated.

Fixes #340

frittentheke · 2023-08-11T08:29:41Z

With the use of the web-download mechanism of Glance images are usually not downloaded by the image-manager but by Glance itself. I'd refrain from adding any code that would download images and requiring enough bandwidth or even disk space to hold them temporarily.

@gndrmnn did you see my comment at #340 (comment) ? in short: Glance does already produce hash values, it's just that this is done with a single fixed hash algo for all images and it cannot be chosen per image.

I suggest to try and extend the import call of the image API (https://docs.openstack.org/api-ref/image/v2/index.html?expanded=import-an-image-detail#import-an-image) to optionally give it a hash algo to use for the checksum. Then you get the image downloaded by Glance and the hash created by Glance and can simply compare it by fetching the image details.

gndrmnn · 2023-08-14T08:50:28Z

@frittentheke Hi,

With the use of the web-download mechanism of Glance images are usually not downloaded by the image-manager but by Glance itself. I'd refrain from adding any code that would download images and requiring enough bandwidth or even disk space to hold them temporarily.

As far as I understand, the update.py and the corresponding call will only ever be run by a workflow here on GitHub. The change in that file will result in all ymls containing a sha-512 going forward, but they will be "pre-computed" for the average user.

@gndrmnn did you see my comment at #340 (comment) ? in short: Glance does already produce hash values, it's just that this is done with a single fixed hash algo for all images and it cannot be chosen per image.

Yes, I read this. In fact, we compare the (in the future existing) SHA-512 contained in yaml files with the hash that glance computes when importing the image (i think that is how it works?). The imported_image in this call is already the representation of the Image object in openstack, which contains the hash that glance presumably calculated.

I suggest to try and extend the import call of the image API (https://docs.openstack.org/api-ref/image/v2/index.html?expanded=import-an-image-detail#import-an-image) to optionally give it a hash algo to use for the checksum. Then you get the image downloaded by Glance and the hash created by Glance and can simply compare it by fetching the image details.

Yes, that would also be possible. It was discussed with @berendt that we simply recompute the hash values for all image yml files at "image update time". But let's keep the PR on hold until @berendt is back from vacation and gives some input.

gndrmnn · 2023-08-30T09:21:06Z

@frittentheke This was discussed with @berendt and it was decided that we implement this PR as is. The hash values in the yaml files will be pre-computed by the GitHub/Zuul Pipeline and not on the user side. The main advantage of this method is that we can implement it right now.

However, your suggestion to enhance the upstream API can still be applied later as a separate feature. That would allow Glance to throw away any corrupt image before it even "reaches" the backend. I will open a separate issue for that.

See #643

mbuechse · 2023-09-07T11:26:30Z

I'm wondering whether the file should be downloaded AFTER the filename is determined (after the whole HEREBE DRAGONS stuff). Or am I missing something? Also, I would actually keep track of the original hash as well, so that the file need not be downloaded every time the Github action runs. Naturally, I'm volunteering to make the changes I propose, and also to rebase to the outcome of #647 :)

gndrmnn · 2023-09-07T12:24:40Z

I'm wondering whether the file should be downloaded AFTER the filename is determined (after the whole HEREBE DRAGONS stuff). Or am I missing something?

If I understand you correctly, it should make no meaningful difference at which of those two points in time the file is downloaded and hashed.

Also, I would actually keep track of the original hash as well, so that the file need not be downloaded every time the Github action runs

Right, that - or some other mechanism - would make sense to safe some bandwidth. We will consider how to best implement that.

Naturally, I'm volunteering to make the changes I propose, and also to rebase to the outcome of #647 :)

No, it's fine we got this. We might come back to ask some questions about #647 if need be later on though :)

mbuechse · 2023-09-07T13:09:32Z

@gndrmnn

If I understand you correctly, it should make no meaningful difference at which of those two points in time the file is downloaded and hashed.

Maybe I got it wrong myself, but in my understanding, the filename is not even known in advance for certain distros.

The latest_url for CentOS Stream 8, for instance, is given as https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-HEREBE\d+\.\dDRAGONS.x86_64.qcow2. When I try do download that, I get a HTTP response code 404. The actual filename has to be determined using the checksums file first to arrive at https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-20230904.0.x86_64.qcow2, which can then be downloaded.

gndrmnn · 2023-09-07T13:21:20Z

@gndrmnn

If I understand you correctly, it should make no meaningful difference at which of those two points in time the file is downloaded and hashed.

Maybe I got it wrong myself, but in my understanding, the filename is not even known in advance for certain distros.

The latest_url for CentOS Stream 8, for instance, is given as https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-HEREBE\d+.\dDRAGONS.x86_64.qcow2. When I try do download that, I get a HTTP response code 404. The actual filename has to be determined using the checksums file first to arrive at https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-20230904.0.x86_64.qcow2, which can then be downloaded.

Ah right, yes. That indeed needs to be taken care of. Thanks for pointing that out.

gndrmnn · 2023-09-22T13:49:17Z

@mbuechse I would kindly ask you to review this Merge Request as you are most familiar with the refactored update.py :)

The current iteration of this MR adds an additional field to the yaml schema which contains the sha512 of the actual image. Therefore, we can use the other existing hash to check if a new version was released. Also the filenames are resolved first now. I'd say it makes sense to review the commits individually.

mbuechse · 2023-09-25T18:20:09Z

@gndrmnn I've been ill. I will certainly have a look one of the next few days.

mbuechse

By and large good, but with minor formatting issues. Also, I think the additional code path for hashing archives should be revisited if possible.

openstack_image_manager/manage.py

openstack_image_manager/update.py

mbuechse · 2023-09-26T08:57:52Z

openstack_image_manager/update.py

+    else:
+        assert download_filename not in ["", ".", " ", "/", ".."]
+        logger.info("Decompressing '%s'" % download_filename)
+        patoolib.extract_archive(download_filename, outdir=".")


It's a pity that we need a dedicated code path for the archives. There are several aspects to be considered

It is probably "more correct" to hash the extracted file instead of the archive, even though the archive shouldn't change if the content doesn't. The question is if the this "added correctness" is worth the price of the additional code path.

We could at least extract the file on-the-fly while downloading it; this is not too hard to achieve with bz2, gzip, and xz; for zip, this might be harder (maybe bordering on impossible). The question is whether zip is really needed here.

It should be noted that patool is licensed under GPLv3, which might be a problem; see Check licenses of dependencies #663. If we extract on the fly as suggested above, then patool would be out of the picture anyway.

It is probably "more correct" to hash the extracted file instead of the archive, even though the archive shouldn't change if the content doesn't. The question is if the this "added correctness" is worth the price of the additional code path.

We can only compare hashes that the glance backend provides us, which is afaik the extracted hash. So hashing the archive would be pointless

We could at least extract the file on-the-fly while downloading it; this is not too hard to achieve with bz2, gzip, and xz; for zip, this might be harder (maybe bordering on impossible). The question is whether zip is really needed here.

Alright i will investigate that

My bad. Of course you are right!

openstack_image_manager/update.py

mbuechse · 2023-09-26T09:01:05Z

openstack_image_manager/update.py

+    logger.info(f"Image {name} change detected. Downloading Image...")
+
+    headers, verify_checksum, extracted_file = download_and_hash(current_url)
+    if verify_checksum == None or extracted_file in ["", ".", " ", "/", ".."]:


PEP-8: Do comparisons with None using is or is not instead of == or !=, respectively.

Besides, the second condition (extracted_file in ["", ".", " ", "/", ".."]) is really smelly (there must be a more certain way to rule out failure), and I'm also pretty sure that it wouldn't occur in this way anyway, because download_and_hash uses this name to open a file, and that should hopefully fail if the name were any of the things listed).

Use the 'verify_checksum' hash value in the yaml files to verify the image integrity after it has been successfully imported. Show a warning, if either the hash algorithm or the hash value does not match the expected fields. Fixes osism#340 Signed-off-by: Gondermann <[email protected]>

Signed-off-by: Gondermann <[email protected]>

Since we want to check the hash against the openstack backend, we need to have the SHA512. Sadly, most images creators do not provide us with that hash pre-computed. That means we will compute the SHA512 for every new image update in the CI worker by downloading the image. Signed-off-by: Gondermann <[email protected]>

gndrmnn · 2023-09-26T09:32:57Z

Why on earth did the PEP8 issues not trigger the Flake8 test?? 🤔

mbuechse · 2023-09-26T09:47:55Z

Why on earth did the PEP8 issues not trigger the Flake8 test?? 🤔

Maybe your fork somehow didn't include the CI triggers?

gndrmnn · 2023-09-26T09:50:13Z

Why on earth did the PEP8 issues not trigger the Flake8 test?? 🤔

Maybe your fork somehow didn't include the CI triggers?

Nope, the flake8 test in osism/zuul-jobs did not have the recurse flag enabled for the "Find .py files" job. Which means that in all osism repos only py files located in the repo root are being tested with flake8 :D

Working on a fix fo osism/zuul-jobs now...

fkr · 2023-09-26T10:25:26Z

Thanks @mbuechse and thanks @gndrmnn for following up to find why this was not caught by CI. With that result this is a very useful finding.

gndrmnn · 2023-10-05T11:53:53Z

recheck

gndrmnn force-pushed the fix_340 branch 3 times, most recently from adab6da to 4b8a8a3 Compare August 10, 2023 10:00

gndrmnn marked this pull request as draft August 10, 2023 13:01

gndrmnn force-pushed the fix_340 branch 4 times, most recently from 8717c5a to ff74ed5 Compare August 31, 2023 12:08

gndrmnn mentioned this pull request Sep 6, 2023

Automatically resolve latest/current URLs for Debian/Ubuntu #647

Merged

gndrmnn force-pushed the fix_340 branch 8 times, most recently from 2b561dd to 513edbc Compare September 21, 2023 10:44

gndrmnn changed the title ~~Validate provided checksum after successful import~~ Validate checksum after successful import Sep 21, 2023

gndrmnn force-pushed the fix_340 branch from 513edbc to 2d96948 Compare September 22, 2023 08:47

gndrmnn requested a review from mbuechse September 22, 2023 13:47

mbuechse reviewed Sep 26, 2023

View reviewed changes

gndrmnn added 3 commits September 26, 2023 11:31

Add verification sha512 sums to images

5bc0135

Signed-off-by: Gondermann <[email protected]>

gndrmnn force-pushed the fix_340 branch from 2d96948 to 6a1dd54 Compare September 26, 2023 09:31

gndrmnn mentioned this pull request Sep 26, 2023

Flake8 and Yamllint zuul jobs do not actually test all files osism/issues#681

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate checksum after successful import #627

Validate checksum after successful import #627

gndrmnn commented Aug 10, 2023 •

edited

Loading

frittentheke commented Aug 11, 2023 •

edited

Loading

gndrmnn commented Aug 14, 2023

gndrmnn commented Aug 30, 2023 •

edited

Loading

mbuechse commented Sep 7, 2023

gndrmnn commented Sep 7, 2023

mbuechse commented Sep 7, 2023

gndrmnn commented Sep 7, 2023

gndrmnn commented Sep 22, 2023

mbuechse commented Sep 25, 2023

mbuechse left a comment

mbuechse Sep 26, 2023

gndrmnn Sep 26, 2023

mbuechse Sep 26, 2023

mbuechse Sep 26, 2023

mbuechse Sep 26, 2023

gndrmnn commented Sep 26, 2023

mbuechse commented Sep 26, 2023

gndrmnn commented Sep 26, 2023

fkr commented Sep 26, 2023

gndrmnn commented Oct 5, 2023

Validate checksum after successful import #627

Are you sure you want to change the base?

Validate checksum after successful import #627

Conversation

gndrmnn commented Aug 10, 2023 • edited Loading

frittentheke commented Aug 11, 2023 • edited Loading

gndrmnn commented Aug 14, 2023

gndrmnn commented Aug 30, 2023 • edited Loading

mbuechse commented Sep 7, 2023

gndrmnn commented Sep 7, 2023

mbuechse commented Sep 7, 2023

gndrmnn commented Sep 7, 2023

gndrmnn commented Sep 22, 2023

mbuechse commented Sep 25, 2023

mbuechse left a comment

Choose a reason for hiding this comment

mbuechse Sep 26, 2023

Choose a reason for hiding this comment

gndrmnn Sep 26, 2023

Choose a reason for hiding this comment

mbuechse Sep 26, 2023

Choose a reason for hiding this comment

mbuechse Sep 26, 2023

Choose a reason for hiding this comment

mbuechse Sep 26, 2023

Choose a reason for hiding this comment

gndrmnn commented Sep 26, 2023

mbuechse commented Sep 26, 2023

gndrmnn commented Sep 26, 2023

fkr commented Sep 26, 2023

gndrmnn commented Oct 5, 2023

gndrmnn commented Aug 10, 2023 •

edited

Loading

frittentheke commented Aug 11, 2023 •

edited

Loading

gndrmnn commented Aug 30, 2023 •

edited

Loading