-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cksum: Accept non UTF-8 inputs #6603
base: main
Are you sure you want to change the base?
Conversation
8111dc4
to
9eecf25
Compare
GNU testsuite comparison:
|
b4d46d2
to
f2abbae
Compare
@BenWiederhake Could you take a look ? :) |
if !last.is_ascii_whitespace() { | ||
break; | ||
} | ||
line_trim = &line_trim[..line_trim.len() - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please document what this line is doing, it isn't obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a simple implementation of the trim_ascii()
function which is not available for MSRV 1.70.
The behavior is extracted in its own documented function in #6654
@@ -1277,3 +1277,27 @@ fn test_non_utf8_filename() { | |||
.stdout_is_bytes(b"SHA256 (funky\xffname) = e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\n") | |||
.no_stderr(); | |||
} | |||
|
|||
#[cfg(target_os = "linux")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why linux only ?
could you please to add a test to inject invalid utf8 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why linux only ?
This is somewhat related to this.
TLDR: With the current MSRV being 1.70, on windows. there is no way to translate between OsStr
and &[u8]
without going through str
(this holds for owned types as well).
could you please to add a test to inject invalid utf8 ?
This is what the test should be doing. Using non UTF-8 characters that would make the old implementation fail because it would try to create String
s from these characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no way to translate between OsStr and &[u8]
Yes there is: os_str_as_bytes
, just like you also use in the crate itself.
This is what the test should be doing.
No, the test does not check for correct handling of files with non-UTF-8 names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to be sure I get it correctly. In the added test, I should check that the hashes
variable get written in a file with a non-UTF-8 name, is this correct ?
If yes, I might need to refactor slightly the test framework in order to be able to write to files with non-UTF-8 names.
@sylvestre all your comments are addressed in #6654. If that's still an issue, please tell me, I will refactor the code accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused.
- This PR has older code, doesn't address the points sylvestre already mentioned, and you also say that checksum: rework for improving checkum checking GNU behavior match #6654 should be used instead.
- The other PR (checksum: rework for improving checkum checking GNU behavior match #6654) doesn't address sylvestre's points either (e.g. the linux-only issue), and seems to be a collection of many different PRs, which makes it unnecessarily difficult to review, and it's also in draft mode (which indicates to me that it's more of a "sneak peek", and not ready).
So which PR would you like us to review?
@@ -1277,3 +1277,27 @@ fn test_non_utf8_filename() { | |||
.stdout_is_bytes(b"SHA256 (funky\xffname) = e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\n") | |||
.no_stderr(); | |||
} | |||
|
|||
#[cfg(target_os = "linux")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no way to translate between OsStr and &[u8]
Yes there is: os_str_as_bytes
, just like you also use in the crate itself.
This is what the test should be doing.
No, the test does not check for correct handling of files with non-UTF-8 names.
f2abbae
to
c2f43fa
Compare
The test is linux-only because Improving the handling of windows filesystem exceptions this might be a further enhancement, but I don't think this should be the main point of this PR.
True, #6654 is a really big refactor, which solves many problems at once. I think about re-organizing all the changes in smaller PRs, while trying to keep the reasoning clear. |
c2f43fa
to
5d68b3b
Compare
8d2ea86
to
3533b6e
Compare
3533b6e
to
0415d72
Compare
As requested, I changed the code to handle non-UTF8 filenames, and I added a test for it. Previously, I used The compromise I went with is to just remove the |
GNU testsuite comparison:
|
- add a test for non UTF-8 chars in comments - add test for non-UTF-8 chars in filenames
0415d72
to
07da5d9
Compare
GNU testsuite comparison:
|
Fixes #6573