Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error when checksum dict has no entry for a file #4150

Merged
merged 7 commits into from
Sep 9, 2023
Merged
9 changes: 3 additions & 6 deletions easybuild/tools/filetools.py
Original file line number Diff line number Diff line change
Expand Up @@ -1265,14 +1265,11 @@ def verify_checksum(path, checksums):

for checksum in checksums:
if isinstance(checksum, dict):
if filename in checksum:
try:
# Set this to a string-type checksum
checksum = checksum[filename]
elif build_option('enforce_checksums'):
raise EasyBuildError("Missing checksum for %s", filename)
else:
# Set to None and allow to fail elsewhere
checksum = None
except KeyError:
raise EasyBuildError("Missing checksum for %s in %s", filename, checksum)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

  • an error is raised only if a checksum is missing and --enforce-checksums is enabled;
  • if a checksum is missing and --enforce-checksums is not enabled, the skipping of the checksum is simply skipped here, by using continue:
try:
    checksum = checksum[filename]
except KeyError:
    if build_option('enforce_checksums'):
        raise EasyBuildError("Missing checksum for %s in dictionary: %s", filename, checksum)
    else:
        # no checksum available, skipping verification since checksums are not enforced
        _log.info("meaningful log message goes here")
        continue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the current behavior that this situation is an error and just provide a meaningful error message.

I drafted easybuilders/easybuild-docs#104 where there are some open question on how checksums should be handled which is likely a breaking change (or a handful of breaking changes)

IMO a dict with a missing filename/key is an error: Likely the dict was meant for arch-specific files&checksums but a new arch is used which wasn't tested yet. Or a spelling error happened making it always fail to verify checksums even if it should.
That's what I suggested in the draft mentioned above: A dict, if present, needs to list each possible filename even without enforce-checksums

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: With a recent issue I noticed that enforce-checksums might be to greedy: We need a None value if the archive (e.g. Git checkout) isn't checksummable. But we want to check that all other checksums are present.

Now we don't (usually) have lists anymore but lists of dicts of filename-to-checksum. Now if enforce-checksums triggers for every explicit None but a missing filename in a dict doesn't trigger an error if it isn't set we have no way to check e.g. the (old) PyTorch ECs reliably.

Hence I think the approach I suggested makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying, I agree that this is at least a good step forward to avoid totally meaningless errors.

If you noticed any other problems, please make sure there's an issue open for it (or if not, open one)


if isinstance(checksum, string_type):
# if no checksum type is specified, it is assumed to be MD5 (32 characters) or SHA256 (64 characters)
Expand Down
10 changes: 10 additions & 0 deletions test/framework/filetools.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,14 @@ def test_checksums(self):
alt_checksums = ('7167b64b1ca062b9674ffef46f9325db7167b64b1ca062b9674ffef46f9325db', broken_checksums['sha256'])
self.assertFalse(ft.verify_checksum(fp, alt_checksums))

# Check dictionary
alt_checksums = (known_checksums['sha256'],)
self.assertTrue(ft.verify_checksum(fp, {os.path.basename(fp): known_checksums['sha256']}))
faulty_dict = {'wrong-name': known_checksums['sha256']}
self.assertErrorRegex(EasyBuildError,
"Missing checksum for " + os.path.basename(fp) + " in .*wrong-name.*",
ft.verify_checksum, fp, faulty_dict)

# check whether missing checksums are enforced
build_options = {
'enforce_checksums': True,
Expand All @@ -362,6 +370,8 @@ def test_checksums(self):
for checksum in [known_checksums[x] for x in ('md5', 'sha256')]:
dict_checksum = {os.path.basename(fp): checksum, 'foo': 'baa'}
self.assertTrue(ft.verify_checksum(fp, dict_checksum))
del dict_checksum[os.path.basename(fp)]
self.assertErrorRegex(EasyBuildError, "Missing checksum for", ft.verify_checksum, fp, dict_checksum)

def test_common_path_prefix(self):
"""Test get common path prefix for a list of paths."""
Expand Down