Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cksum: gets confused by base64 that happens to consist entirely of hexadecimal digits #6572

Open
BenWiederhake opened this issue Jul 15, 2024 · 2 comments · May be fixed by #6654
Open

cksum: gets confused by base64 that happens to consist entirely of hexadecimal digits #6572

BenWiederhake opened this issue Jul 15, 2024 · 2 comments · May be fixed by #6654

Comments

@BenWiederhake
Copy link
Collaborator

The Base64 alphabet has, as the name suggests, 64 letters. 22 of these letters look like hexadecimal digits. That means that a random string of 8 Base64 letters (which encodes 6 bytes = 48 bits) has a chance of (22/64)^8 ~= 2^-12.3 to be a valid hexadecimal string. This means that generating a hash with an output length of 24 bits or a multiple thereof (e.g. SHA384 or Blake2b-48) might generate two different hexadecimal-looking hashes (of different lengths). This can cause all kinds of shenanigans with cksum, which has to detect/guess the encoding from the sums-file.

In particular, here is a case where it goes wrong:

$ echo -n esq > foo.dat # The bytestring b"esq" is very special
$ cksum --algo=blake2b --length=48 --base64 foo.dat | tee foo.sums # Because the base64 *looks* like it's hexadecimal!
BLAKE2b-48 (foo.dat) = fc1f97C4
$ cksum --check foo.sums # GNU cksum takes no issue with this.
foo.dat: OK
$ cargo run -q cksum --check foo.sums # But uutils gets confused by this.
foo.dat: FAILED
cksum: WARNING: 1 computed checksum did NOT match
[$? = 1]
$ cargo run -q hashsum --b2sum --bits 48 --check foo.sums # hashsum *also* gets confused by this.
foo.dat: FAILED
hashsum: WARNING: 1 computed checksum did NOT match
[$? = 1]

There are probably more bugs like this.

Note that this is not specific to blake2b: With SHA384, it would probably require around 2^99 attempts to find a file that hashes to a digest that triggers this bug. For reference, the Bitcoin mining community computes about 2^60 hashes per second according to some sketchy website, which is good enough for this thought experiment. So it would require about 17734 years to find that file. Okay, nevermind, this bug doesn't realistically affect SHA384. (But theoretically it does.)

Found while reading #6500 (probably unrelated though).

CC @sylvestre, because you seem to be interested in this kind of bugs.

@tertsdiepraam
Copy link
Member

tertsdiepraam commented Jul 16, 2024

Great find! Is this a GNU issue or just for our implementation?

@BenWiederhake
Copy link
Collaborator Author

I'm not entirely sure what you mean? The GNU behavior seems self-consistent, we differ from GNU behavior, and aren't self-consistent. So I'd say that this is a bug in uutils.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants