Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit scope of libc usage in note-c. #126

Merged
merged 3 commits into from
Dec 12, 2023

Conversation

haydenroche5
Copy link
Collaborator

See commit messages for more details.

n_cjson.c Outdated
// Greater than z -> not a valid letter.
// Between a and Z -> not a valid letter.
// Between a and z (inclusive) -> already lower.
if (c < 'A' || c > 'z' || (c < 'a' && c > 'Z') || (c >= 'a' && c <= 'z')) {
Copy link
Collaborator

@zfields zfields Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly complicated.

It becomes more obvious if you swap in numbers.

Return the letter if:
x < 65 OR x > 122 OR (x < 97 AND x > 90) OR (x >= 97 AND x <= 122)

-128 >= x < 65,
x > 122,
90 < x < 97,
97 <= x <= 122

-128 <------ 65 ______ 91 --97--122--> 127

Suggested change
if (c < 'A' || c > 'z' || (c < 'a' && c > 'Z') || (c >= 'a' && c <= 'z')) {
if (c < 'A' || c > 'Z') {

Am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah I think your change works.

"memset"
# These string functions are ok.
"strchr"
"strcmp"
Copy link
Collaborator

@zfields zfields Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good opportunity to not allow these unsafe functions, especially when the safer version is listed below (strncmp).

Obviously there may be a good reason, but if we can get away from it we probably should.

Copy link
Collaborator

@zfields zfields Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't think this should prevent this commit from happening. If it's a chip shot then get it, but otherwise we should add it to the backlog - especially now that there is a mechanism for protecting against this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to re-immerse myself in the discourse on this age-old argument, but my rough understanding is that you're not that much better off using strncmp over strcmp: https://stackoverflow.com/questions/24353504/whats-wrong-with-strcmp

If your string isn't NULL-terminated, all bets are off.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the two most compelling arguments from the post you linked are:

FOR:

Reading, writing, executing... it doesn't matter. Any memory reference to an unintended address is undefined behavior. In the most apparent scenario, you attempt to access a page that isn't mapped into your process's address space, causing a page fault, and subsequent SIGSEGV. In the worst case, you sometimes run into a \0 byte, but othertimes you run into some other buffer, causing inconstant program behavior. –
Jonathon Reinhart

AGAINST:

And if you're trying to call strcmp or strncmp with an array that you thought contained a null-terminated string but actually doesn't, then your code already has a bug. Using strncmp() might help you avoid the immediate symptom of that bug, but it won't fix it. - Keith Thompson

To my way of thinking, strncmp() is more intentional than strcmp(), because it requires the programmer to consider/know the size of the string they want to test. Yes, bugs can exist with both, but the scope of the damage can be lessened with strncmp(). As a counter-point, it's usually easier to find a smoking gun when the offense is greater, but that damage is inflicted upon the user (so that's not a great experience for them).

All in all, I would go ahead and merge the PR today, but in my opinion, I think we would be better served to reduce our libc footprint and move toward intentional coding.

n_cjson.c Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
n_cjson.c Outdated
// Greater than z -> not a valid letter.
// Between a and Z -> not a valid letter.
// Between a and z (inclusive) -> already lower.
if (c < 'A' || c > 'z' || (c < 'a' && c > 'Z') || (c >= 'a' && c <= 'z')) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah I think your change works.

"memset"
# These string functions are ok.
"strchr"
"strcmp"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to re-immerse myself in the discourse on this age-old argument, but my rough understanding is that you're not that much better off using strncmp over strcmp: https://stackoverflow.com/questions/24353504/whats-wrong-with-strcmp

If your string isn't NULL-terminated, all bets are off.

After some discussion with Ray, we settled on certain libc functions that we're
ok using in note-c. Everything else from libc should be excluded. This commit
adds a CMake option to build the note-c library without libc. It also tells the
linker to generate errors for undefined references. The result is that we can
run this build to see what libc functions we're using in note-c. A new Bash
script, check_libc_dependencies.sh, processes the output of this build. If any
non-permitted functions are found in the undefined reference errors, the script
fails. This protects us against the introduction of non-permitted libc
functions. A new job in ci.yml runs this script on every PR and push to master.
"memset"
# These string functions are ok.
"strchr"
"strcmp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the two most compelling arguments from the post you linked are:

FOR:

Reading, writing, executing... it doesn't matter. Any memory reference to an unintended address is undefined behavior. In the most apparent scenario, you attempt to access a page that isn't mapped into your process's address space, causing a page fault, and subsequent SIGSEGV. In the worst case, you sometimes run into a \0 byte, but othertimes you run into some other buffer, causing inconstant program behavior. –
Jonathon Reinhart

AGAINST:

And if you're trying to call strcmp or strncmp with an array that you thought contained a null-terminated string but actually doesn't, then your code already has a bug. Using strncmp() might help you avoid the immediate symptom of that bug, but it won't fix it. - Keith Thompson

To my way of thinking, strncmp() is more intentional than strcmp(), because it requires the programmer to consider/know the size of the string they want to test. Yes, bugs can exist with both, but the scope of the damage can be lessened with strncmp(). As a counter-point, it's usually easier to find a smoking gun when the offense is greater, but that damage is inflicted upon the user (so that's not a great experience for them).

All in all, I would go ahead and merge the PR today, but in my opinion, I think we would be better served to reduce our libc footprint and move toward intentional coding.

@haydenroche5 haydenroche5 merged commit 2d3f0b1 into blues:master Dec 12, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants