-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance documentation of checksums
easyconfig parameter
#104
Conversation
@smoors Thanks for the review. Besides the stylistic issues I wanted to put the general handling up for discussion, see the 2nd part of the description (starting at "So things to decide:") So those questions should be answered first, maybe in the next confcall (which I can't attend unfortunately) |
@boegel Thanks for the review, however I'd like a decision first on "things to decide:" in the OP. Also are you sure about e.g. #104 (comment) ? |
checksums
easyconfig parameter
I updated this with the review comments applied especially as currently not being able to use alternative checksums in dict entries came up, so this should be resolved timely. Decisions I basically assumed and documented which I'd like to outline:
Missing from documentation and possibly implementation: |
Add more examples of which formats are possible. Also contains not-yet implemented or broken features.
Co-authored-by: Sam Moors <[email protected]>
Sorry for being late to the party here, and i don't want to complicate/stir things up, but i was just curious about
I'm just wondering what the keys would b in this top-level dict? I would have assumes it would be the filenames, so that length check would still hold? If not, what would the keys be? Are we expecting duplicates because i can't see how wouldn't be horribly broken. If i had to pick a setup a way to set it up from scratch i would had just done checksums = {
'source1.tar.gz': '1234...abcd', # 99.9% of cases
'source_multiple_checksums.tar.gz': ('xxxxx', 'yyyyy'), # 0.05% of cases
'flaky_source.tar.gz': None, # 0.05% of cases
'oh.my.god.please.no': {'crc32', '0x1553842328'}, # or something like that
} I'm also very confused by
clearly indicating that it's a single file, but then
two different file names? I don't get it. |
The dict-thing is for stuff like I would too prefer a top-level dict. The length check ( It also might be hard to get the code working for the old-style (using a list) and the new style (dict) without making it too complicated, although I guess it is doable.
Maybe not "a single file" but "a single source/patch entry" as the dict you quoted is a single "file" just with different names. We could make that example more clear I guess by using the (resolved) template names for e.g. java |
ah yes i forgot about those. Are we doing a whole lot of assuming that the checksums equal the sources + patches throughout the code, or is it just EasyBlock.check_checksums_for method? Because that already does support the If one flattened the checksums to just deal with files, e.g. checksums = {
'openjdk-17.1.2.3-x86.sh': 'xxxxx',
'openjdk-17.1.2.3-power.sh': 'yyyyy',
'openjdk-17.1.2.3-arm.sh': 'zzzzzz',
} one could just treat this the same as the external json, and maybe we could add a little something extra to warn on unused checksum entries (where we simply whitelist the rare cases like Java in the CI) Checking the code just now, i spot: # Single source should be re-wrapped as a list, and checksums with it
if isinstance(sources, dict):
sources = [sources] so kind of already supported (as long as you only have one source entry)
Fully agree |
Seems like the rest is able to handle the dict even though documentation/dcostrings says otherwise, e.g.
Can there be any cases where 2 different sources might have the same name? Maybe with crates or git checkouts? But I guess the download_filename feature could fix that rare case.
I'm not a fan of more hard-coded exceptions...
The 2nd is unused due to a typo.
Maybe we can change But in both cases (whitelist, templates) we may need to update that every now and then to handle some contributions
That is for sources, the checksum is checked for So in summary I think it does make sense to have/allow a top-level dict for checksums to get rid of the extra
Sounds easy so far. Improving And: Would we update the existing ECs? |
That would cause problems even if we didn't check checksums. I have encountered this scenario with some extensions that were just
Yes i agree, though I don't believe we are checking for unused Java checksums either when using the dict-thing, so the hard case you mention isn't covered today either. Not to excuse it, but I suspect it's pretty narly to check given how core the architecture thing is (and we need to account for that not all architectures are supported in all commercial tools). That would be a pure CI thing, so nothing we need to block EB 5.0 over.
Yes that all sounds correct to me.
I would say we just start making --inject-checksums use this as the default. |
Currently we check
Then we should just exclude any software that uses
Should do, yes. So how shall we proceed? Merge this and the related PRs first and then implement the full dict support? I'd say both are orthogonal as the open PRs just bring documentation and implementation in line and fix corner cases and inconsistencies. Support for top-level dicts can still be added as it is mostly a duty of |
Yes it's just so many PRs here i need some help keeping track of what to merge and when. We are 100% focusing on the 5.0 beta release now. I think this documentation goes in, and easybuilders/easybuild-framework#4578 to start? If you have duplicate PRs for develop, i think you can just close those to help me out, since we will just deal with the merge with develop later, no more PRs there unless there is a very strong reason. |
I'd suggest the order:
The 2nd would need a rebase after the first is merged to avoid the duplicate commit that is required to test the changes depending on that other PR. I closed the PRs to develop with a reference to the new ones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but i would expect we probably want to restructure a bit if we manage to make the type(checksums) == dict a reality.
someone else needs to merge this, i don't have permission to |
Add more examples of which formats are possible.
Also contains not-yet implemented or broken features!
This is related to easybuilders/easybuild-framework#4142, easybuilders/easybuild-framework#4177, easybuilders/easybuild-framework#4150, easybuilders/easybuild-framework#4159, easybuilders/easybuild-framework#4164
We have https://github.com/easybuilders/easybuild-framework/blob/e3681ae53628400096f3e29d58e787bf1173f27b/test/framework/type_checking.py#L184-L225
and https://github.com/easybuilders/easybuild-framework/blob/e3681ae53628400096f3e29d58e787bf1173f27b/test/framework/type_checking.py#L709-L719
which tests various formats for checksums.
We have code in
get_checksum_for
which supportschecksums
being a dict instead of a list: https://github.com/easybuilders/easybuild-framework/blob/e3681ae53628400096f3e29d58e787bf1173f27b/easybuild/framework/easyblock.py#L369However this is not supported by the type-checking code and hence
to_checksums
will be called which would iterate over the dicts keys mistaking them for checksums!!! https://github.com/easybuilders/easybuild-framework/blob/e3681ae53628400096f3e29d58e787bf1173f27b/easybuild/framework/easyconfig/types.py#L475Furthermore this is made more difficult by the YEB files which use a YAML format where tuples are not supported. E.g. a test file has this:
This is a checksum entry for a single file. Obviously the inner-most 2-element lists should be converted to tuples and the outer-most list should be a list. But it isn't clear what to do with the 2nd list: Are those alternative checksums where only one needs to match or additional checksums that all need to match? Both cases should be supported.
I would argue that an additional level can be used:
checksums: [[['mainchecksum', 'altchecksum']]]
The type conversion code can deduce that after the 2nd level of lists only tuples may be specified as a list (i.e. an AND) inside a list (already an AND) doesn't make sense, so the 2nd level list is an AND consisting only of a single element, which is redundant but ok.And finally specifying
None
in a dict currently yields the same error as not specifying it, see easybuilders/easybuild-framework#4142So things to decide:
checksums
to be a dict? This would make the base check (len(checksums) = len(srcs+patches)
) impossible. So I'd keep it a list.to_checksums
withNone
values in dicts and recursion easybuild-framework#4159 as I see no reason why you'd want that. Or is there any?None
? In a tuple it doesn't make sense, so I'd disallow it there. For a dict should we differ between having a key-None entry or not having any entry especially related to--enforce-checksums
?I see 2 ways here:
As for the missing-key case:
I'd handle it the same as not specifying it: Error whenI'd treat it as an error such that typos can be detected as otherwise e.g.enforce_checksums
is active else treat as matched.'src_X86_64.tgz'
might be used instead of'src_x86_64.tgz'
looking validPort of easybuilders/easybuild#853 to the new markdown format