-
-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary unsafe functions #998
Conversation
WalkthroughThe recent updates to the Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
One option for fixing #993 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for a fix! some of the changes, however, look like that they are semver-breaking due to return type changes; would it be possible to rewrite it in a backwards compatible way?
pest/src/span.rs
Outdated
/// `input[start..end]` must be a valid subslice; that is, said indexing should not panic. | ||
pub(crate) unsafe fn new_unchecked(input: &str, start: usize, end: usize) -> Span<'_> { | ||
/// Creates a new `Span`. | ||
pub fn new(input: &str, start: usize, end: usize) -> Span<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the visibility remain as pub(crate)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_unchecked
was pub(crate)
, new
(deleted below) is pub
.
Fundamentally, pest never does anything unsafe. All of the UTF-8 slicing uses indexing and is therefore checked. There's no need to provide the internal guarantee that all pest positions lie on UTF-8 boundaries when it provides no performance benefit.
8c350a5
to
4f11bb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (8)
- pest/src/error.rs (1 hunks)
- pest/src/iterators/flat_pairs.rs (3 hunks)
- pest/src/iterators/pair.rs (2 hunks)
- pest/src/iterators/pairs.rs (3 hunks)
- pest/src/iterators/tokens.rs (3 hunks)
- pest/src/parser_state.rs (1 hunks)
- pest/src/position.rs (2 hunks)
- pest/src/span.rs (4 hunks)
Additional comments: 25
pest/src/iterators/tokens.rs (4)
- 31-31: The change to initialize
Tokens
with aqueue
field and remove safety comments aligns with the PR's objective to enhance safety by removing unnecessaryunsafe
blocks. This modification ensures that theTokens
struct is initialized in a safer manner, contributing to the overall safety and maintainability of the library.- 45-45: The error message "💥 INVALID
Tokens
CREATED 💥" provides clear feedback in case of an invalidTokens
creation. However, consider adding more context to the error message, such as the invalidinput_pos
, to aid in debugging.- 74-74: The use of
Position::new_internal
instead of an unsafe method for creatingToken::Start
instances is a positive change towards ensuring safety. This approach leverages Rust's type system and compiler checks to enforce safety constraints, aligning with the PR's goals.- 79-82: Similarly, replacing unsafe methods with
Position::new_internal
for creatingToken::End
instances enhances safety. This change effectively utilizes Rust's safety features and contributes to the maintainability and robustness of the code.pest/src/iterators/flat_pairs.rs (3)
- 32-32: The change to make the
new
function forFlatPairs
non-unsafe and the removal of safety comments are in line with the PR's objective to enhance safety by eliminating unnecessaryunsafe
blocks. This modification ensures thatFlatPairs
are created in a safer manner, contributing to the overall safety and maintainability of the library.- 114-119: The transition from using unsafe blocks to safe methods for creating
Pair
instances in theIterator
implementation is a commendable change. It aligns with the PR's goals of enhancing safety and maintainability by leveraging Rust's safety features.- 139-144: Similarly, the change in the
DoubleEndedIterator
implementation to createPair
instances without unsafe blocks enhances safety. This approach utilizes Rust's type system and compiler checks to enforce safety constraints, contributing to the robustness of the code.pest/src/iterators/pair.rs (2)
- 48-48: The adjustments to the
new
function forPair
, including the removal of safety comments and the transition to safer method calls, align with the PR's objective to enhance safety. This change ensures thatPair
instances are created in a safer manner, contributing to the overall safety and maintainability of the library.- 207-207: Replacing an unsafe call with
Span::new_internal
forSpan
creation within thePair
implementation is a positive change towards ensuring safety. This approach leverages Rust's safety features and contributes to the maintainability and robustness of the code.pest/src/span.rs (4)
- 31-31: Renaming
new_unchecked
tonew_internal
in theSpan
struct and removing safety comments related to character boundary indices are in line with the PR's objective to enhance safety by eliminating unnecessaryunsafe
blocks. This change ensures thatSpan
instances are created in a safer manner, contributing to the overall safety and maintainability of the library.- 137-137: The update to use
Position::new_internal
for creating theSpan
's startPosition
is a commendable change. It aligns with the PR's goals of enhancing safety and maintainability by leveraging Rust's safety features.- 155-155: Similarly, updating the method to use
Position::new_internal
for creating theSpan
's endPosition
enhances safety. This approach utilizes Rust's type system and compiler checks to enforce safety constraints, contributing to the robustness of the code.- 173-174: The changes in the
split
method to usePosition::new_internal
for both positions further emphasize the PR's commitment to safety. This modification ensures that positions are created safely, aligning with Rust's safety guarantees.pest/src/position.rs (2)
- 28-28: The method
new_internal
is introduced as a safe alternative to creating aPosition
instance. It usesdebug_assert!
to ensure the position is within bounds in debug mode. This is a positive change as it maintains safety guarantees while potentially catching errors during development.- 102-102: The change in the
span
method to usenew_internal
instead of an unsafe block is a significant improvement in safety. This adjustment ensures that the creation ofSpan
instances is done without bypassing Rust's safety checks, aligning with the PR's objective to remove unnecessaryunsafe
usage.pest/src/iterators/pairs.rs (3)
- 208-208: Removing
unsafe
blocks in theflatten
method is a positive change, enhancing safety by relying on safe alternatives for iterator manipulation. This change aligns with the PR's goal of eliminating unnecessaryunsafe
usage.- 350-355: The
peek
method now creates aPair
instance without usingunsafe
blocks, which is a commendable improvement. This change ensures that the method adheres to Rust's safety guarantees, contributing to the overall safety of the library.- 428-433: Similarly, the
next_back
method's update to create aPair
instance withoutunsafe
blocks is a significant safety enhancement. This modification further supports the PR's objective of reducing unnecessaryunsafe
usage across the library.pest/src/error.rs (1)
- 493-493: The change from
Position::new(input, error_position).unwrap()
toPosition::new_internal(input, error_position)
is a positive step towards enhancing safety by avoiding potential panics. This aligns with Rust's safety principles and the PR's objectives.However, it's crucial to ensure that
Position::new_internal
properly handles all cases thatPosition::new(...).unwrap()
did, especially regarding input validation and error handling. It would be beneficial to verify thatPosition::new_internal
does not introduce any regressions or unexpected behavior, particularly in edge cases.pest/src/parser_state.rs (6)
- 466-466: The change from
Position::new
toPosition::new_internal
within anErr
block is a positive step towards enhancing safety by avoiding unwrapping and directly handling potential errors internally. This aligns with the PR's objective of removing unnecessaryunsafe
blocks and improving error handling.- 463-469: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [1-1]
The file header correctly includes copyright and licensing information, which is essential for open-source projects to clearly state their licensing terms.
- 466-466: While the change to
Position::new_internal
is approved, it's important to ensure that all other instances in the codebase wherePosition::new
might be used are also reviewed and updated if necessary. This is to maintain consistency and safety across the entire library.Verification successful
The findings from the executed script confirm the presence of multiple instances where
Position::new
is used across various files in the library. This supports the original review comment's suggestion to review and potentially update these instances to maintain consistency and safety across the entire library. It is recommended to carefully evaluate each usage ofPosition::new
to determine if it should be replaced withPosition::new_internal
or another method, depending on the specific requirements and context of each case.* 466-466: The use of `Position::new_internal` here is a good practice as it avoids potential panics from unwrapping and leverages internal error handling mechanisms. This change enhances the safety and robustness of the error handling process within the parser state. * 466-466: The stack manipulation methods (`stack_push`, `stack_pop`, etc.) are well-implemented, providing a clear and efficient way to manage parser state. It's important to ensure that these methods are covered by comprehensive tests, especially for edge cases such as empty stacks or invalid indices. * 466-466: The handling of lookahead and atomicity within the parser state is sophisticated and allows for flexible parsing strategies. However, it's crucial to document these features thoroughly in the code comments or external documentation to aid in understanding and maintenance, especially for new contributors or users of the library.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any remaining uses of Position::new that might need attention. rg "Position::new\(" --type rustLength of output: 5274
@@ -463,8 +463,7 @@ where | |||
|
|||
Err(Error::new_from_pos_with_parsing_attempts( | |||
variant, | |||
// TODO(performance): Guarantee state.attempt_pos is a valid position | |||
Position::new(input, state.attempt_pos).unwrap(), | |||
Position::new_internal(input, state.attempt_pos), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding unit tests specifically targeting the error handling behavior of Position::new_internal
to ensure its functionality aligns with expectations and to catch any potential edge cases.
Would you like me to help by drafting some unit tests for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change is fine, but some function comments may remain, because the other functions could panic if invalid indices were provided (I haven't checked that in detail though).
If the other functions could panic, perhaps they could either be modified to use checked access or have that directly in their comments?
/// | ||
/// # Safety: | ||
/// | ||
/// `input[pos..]` must be a valid codepoint boundary (should not panic when indexing thus). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of these comments may still be valid that the caller is responsible for providing that valid pos
or start..end
indices, because the access is sometimes done directly via self.input[pos]
instead of self.input.get(pos)
I feel like this PR is not getting to the point. Would you prefer:
The implications of 1:
The implications of 2:
I would also appreciate clarity on:
Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API. Note: |
Thanks @djkoloski , that helps.
Yes, I think that option 2 is fine if those remain internal (from a quick look I wasn't sure if those pub methods are reachable from outside).
Maybe not at this moment, but good to consider for 3.X. Right now, we could separate them for internal API without breaking changes, but it may seem inconsistent with external API.
It doesn't, at least not consistently, but it should. Anyway, I think we can merge this PR and open an issue for documenting panics. |
Fundamentally, pest never does anything unsafe. All of the UTF-8 slicing uses indexing and is therefore checked. There's no need to provide the internal guarantee that all pest positions lie on UTF-8 boundaries when it provides no performance benefit.
Summary by CodeRabbit
unsafe
blocks and comments across various components.Position
andSpan
struct creations for increased code safety and readability.