Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary unsafe functions #998

Merged
merged 1 commit into from
Mar 24, 2024

Conversation

djkoloski
Copy link
Contributor

@djkoloski djkoloski commented Mar 21, 2024

Fundamentally, pest never does anything unsafe. All of the UTF-8 slicing uses indexing and is therefore checked. There's no need to provide the internal guarantee that all pest positions lie on UTF-8 boundaries when it provides no performance benefit.

Summary by CodeRabbit

  • Refactor
    • Improved error handling mechanisms for better stability.
    • Enhanced safety by removing unnecessary unsafe blocks and comments across various components.
    • Streamlined Position and Span struct creations for increased code safety and readability.

@djkoloski djkoloski requested a review from a team as a code owner March 21, 2024 16:46
@djkoloski djkoloski requested review from NoahTheDuke and removed request for a team March 21, 2024 16:46
Copy link
Contributor

coderabbitai bot commented Mar 21, 2024

Walkthrough

The recent updates to the pest library involve significant improvements in error handling and safety. The changes include eliminating unsafe code blocks and refining the creation of Position and Span objects for better reliability. These modifications enhance the overall safety and maintainability of the codebase, making it more robust and error-resistant.

Changes

Files Change Summary
error.rs, parser_state.rs Replaced direct Position::new with new_internal for improved error handling.
iterators/flat_pairs.rs Removed unsafe and safety comments in FlatPairs.
iterators/pair.rs Updated safety comments and calls in Pair for safer Span creation.
iterators/pairs.rs Eliminated unsafe blocks in flatten, peek, and next_back.
iterators/tokens.rs Updated struct initialization and error handling in Tokens.
position.rs, span.rs Refactored Position and Span creation to use new_internal, removing unsafe usage.

🐇✨
In the realm of code where bugs dare to tread,
A rabbit hopped in, making errors dread.
With a flick and a hop, unsafe tags were shed,
Positions and spans, now safely led.
"To safer pastures!" the rabbit said.
🌟🌿

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@djkoloski
Copy link
Contributor Author

One option for fixing #993

@tomtau tomtau added the pr label Mar 22, 2024
Copy link
Contributor

@tomtau tomtau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for a fix! some of the changes, however, look like that they are semver-breaking due to return type changes; would it be possible to rewrite it in a backwards compatible way?

pest/src/position.rs Outdated Show resolved Hide resolved
pest/src/span.rs Outdated
/// `input[start..end]` must be a valid subslice; that is, said indexing should not panic.
pub(crate) unsafe fn new_unchecked(input: &str, start: usize, end: usize) -> Span<'_> {
/// Creates a new `Span`.
pub fn new(input: &str, start: usize, end: usize) -> Span<'_> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the visibility remain as pub(crate)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_unchecked was pub(crate), new (deleted below) is pub.

@tomtau
Copy link
Contributor

tomtau commented Mar 22, 2024

Fundamentally, pest never does anything unsafe. All of the UTF-8 slicing
uses indexing and is therefore checked. There's no need to provide the
internal guarantee that all pest positions lie on UTF-8 boundaries when
it provides no performance benefit.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 9f9094e and 4f11bb7.
Files selected for processing (8)
  • pest/src/error.rs (1 hunks)
  • pest/src/iterators/flat_pairs.rs (3 hunks)
  • pest/src/iterators/pair.rs (2 hunks)
  • pest/src/iterators/pairs.rs (3 hunks)
  • pest/src/iterators/tokens.rs (3 hunks)
  • pest/src/parser_state.rs (1 hunks)
  • pest/src/position.rs (2 hunks)
  • pest/src/span.rs (4 hunks)
Additional comments: 25
pest/src/iterators/tokens.rs (4)
  • 31-31: The change to initialize Tokens with a queue field and remove safety comments aligns with the PR's objective to enhance safety by removing unnecessary unsafe blocks. This modification ensures that the Tokens struct is initialized in a safer manner, contributing to the overall safety and maintainability of the library.
  • 45-45: The error message "💥 INVALID Tokens CREATED 💥" provides clear feedback in case of an invalid Tokens creation. However, consider adding more context to the error message, such as the invalid input_pos, to aid in debugging.
  • 74-74: The use of Position::new_internal instead of an unsafe method for creating Token::Start instances is a positive change towards ensuring safety. This approach leverages Rust's type system and compiler checks to enforce safety constraints, aligning with the PR's goals.
  • 79-82: Similarly, replacing unsafe methods with Position::new_internal for creating Token::End instances enhances safety. This change effectively utilizes Rust's safety features and contributes to the maintainability and robustness of the code.
pest/src/iterators/flat_pairs.rs (3)
  • 32-32: The change to make the new function for FlatPairs non-unsafe and the removal of safety comments are in line with the PR's objective to enhance safety by eliminating unnecessary unsafe blocks. This modification ensures that FlatPairs are created in a safer manner, contributing to the overall safety and maintainability of the library.
  • 114-119: The transition from using unsafe blocks to safe methods for creating Pair instances in the Iterator implementation is a commendable change. It aligns with the PR's goals of enhancing safety and maintainability by leveraging Rust's safety features.
  • 139-144: Similarly, the change in the DoubleEndedIterator implementation to create Pair instances without unsafe blocks enhances safety. This approach utilizes Rust's type system and compiler checks to enforce safety constraints, contributing to the robustness of the code.
pest/src/iterators/pair.rs (2)
  • 48-48: The adjustments to the new function for Pair, including the removal of safety comments and the transition to safer method calls, align with the PR's objective to enhance safety. This change ensures that Pair instances are created in a safer manner, contributing to the overall safety and maintainability of the library.
  • 207-207: Replacing an unsafe call with Span::new_internal for Span creation within the Pair implementation is a positive change towards ensuring safety. This approach leverages Rust's safety features and contributes to the maintainability and robustness of the code.
pest/src/span.rs (4)
  • 31-31: Renaming new_unchecked to new_internal in the Span struct and removing safety comments related to character boundary indices are in line with the PR's objective to enhance safety by eliminating unnecessary unsafe blocks. This change ensures that Span instances are created in a safer manner, contributing to the overall safety and maintainability of the library.
  • 137-137: The update to use Position::new_internal for creating the Span's start Position is a commendable change. It aligns with the PR's goals of enhancing safety and maintainability by leveraging Rust's safety features.
  • 155-155: Similarly, updating the method to use Position::new_internal for creating the Span's end Position enhances safety. This approach utilizes Rust's type system and compiler checks to enforce safety constraints, contributing to the robustness of the code.
  • 173-174: The changes in the split method to use Position::new_internal for both positions further emphasize the PR's commitment to safety. This modification ensures that positions are created safely, aligning with Rust's safety guarantees.
pest/src/position.rs (2)
  • 28-28: The method new_internal is introduced as a safe alternative to creating a Position instance. It uses debug_assert! to ensure the position is within bounds in debug mode. This is a positive change as it maintains safety guarantees while potentially catching errors during development.
  • 102-102: The change in the span method to use new_internal instead of an unsafe block is a significant improvement in safety. This adjustment ensures that the creation of Span instances is done without bypassing Rust's safety checks, aligning with the PR's objective to remove unnecessary unsafe usage.
pest/src/iterators/pairs.rs (3)
  • 208-208: Removing unsafe blocks in the flatten method is a positive change, enhancing safety by relying on safe alternatives for iterator manipulation. This change aligns with the PR's goal of eliminating unnecessary unsafe usage.
  • 350-355: The peek method now creates a Pair instance without using unsafe blocks, which is a commendable improvement. This change ensures that the method adheres to Rust's safety guarantees, contributing to the overall safety of the library.
  • 428-433: Similarly, the next_back method's update to create a Pair instance without unsafe blocks is a significant safety enhancement. This modification further supports the PR's objective of reducing unnecessary unsafe usage across the library.
pest/src/error.rs (1)
  • 493-493: The change from Position::new(input, error_position).unwrap() to Position::new_internal(input, error_position) is a positive step towards enhancing safety by avoiding potential panics. This aligns with Rust's safety principles and the PR's objectives.

However, it's crucial to ensure that Position::new_internal properly handles all cases that Position::new(...).unwrap() did, especially regarding input validation and error handling. It would be beneficial to verify that Position::new_internal does not introduce any regressions or unexpected behavior, particularly in edge cases.

pest/src/parser_state.rs (6)
  • 466-466: The change from Position::new to Position::new_internal within an Err block is a positive step towards enhancing safety by avoiding unwrapping and directly handling potential errors internally. This aligns with the PR's objective of removing unnecessary unsafe blocks and improving error handling.
  • 463-469: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [1-1]

The file header correctly includes copyright and licensing information, which is essential for open-source projects to clearly state their licensing terms.

  • 466-466: While the change to Position::new_internal is approved, it's important to ensure that all other instances in the codebase where Position::new might be used are also reviewed and updated if necessary. This is to maintain consistency and safety across the entire library.
Verification successful

The findings from the executed script confirm the presence of multiple instances where Position::new is used across various files in the library. This supports the original review comment's suggestion to review and potentially update these instances to maintain consistency and safety across the entire library. It is recommended to carefully evaluate each usage of Position::new to determine if it should be replaced with Position::new_internal or another method, depending on the specific requirements and context of each case.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for any remaining uses of Position::new that might need attention.
rg "Position::new\(" --type rust

Length of output: 5274

* 466-466: The use of `Position::new_internal` here is a good practice as it avoids potential panics from unwrapping and leverages internal error handling mechanisms. This change enhances the safety and robustness of the error handling process within the parser state. * 466-466: The stack manipulation methods (`stack_push`, `stack_pop`, etc.) are well-implemented, providing a clear and efficient way to manage parser state. It's important to ensure that these methods are covered by comprehensive tests, especially for edge cases such as empty stacks or invalid indices. * 466-466: The handling of lookahead and atomicity within the parser state is sophisticated and allows for flexible parsing strategies. However, it's crucial to document these features thoroughly in the code comments or external documentation to aid in understanding and maintenance, especially for new contributors or users of the library.

@@ -463,8 +463,7 @@ where

Err(Error::new_from_pos_with_parsing_attempts(
variant,
// TODO(performance): Guarantee state.attempt_pos is a valid position
Position::new(input, state.attempt_pos).unwrap(),
Position::new_internal(input, state.attempt_pos),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding unit tests specifically targeting the error handling behavior of Position::new_internal to ensure its functionality aligns with expectations and to catch any potential edge cases.

Would you like me to help by drafting some unit tests for this change?

Copy link
Contributor

@tomtau tomtau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change is fine, but some function comments may remain, because the other functions could panic if invalid indices were provided (I haven't checked that in detail though).

If the other functions could panic, perhaps they could either be modified to use checked access or have that directly in their comments?

///
/// # Safety:
///
/// `input[pos..]` must be a valid codepoint boundary (should not panic when indexing thus).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of these comments may still be valid that the caller is responsible for providing that valid pos or start..end indices, because the access is sometimes done directly via self.input[pos] instead of self.input.get(pos)

@djkoloski
Copy link
Contributor Author

I feel like this PR is not getting to the point. Would you prefer:

  1. Pest keeps the type invariant that Position always lands on a UTF-8 codepoint boundary, or
  2. Pest stops caring about whether Position lands on a codepoint boundary because all slicing and indexing operations are checked.

The implications of 1:

  • All Positions must refer to a valid UTF-8 codepoint boundary. Similar invariants propagate into Span, Pair, FlatPairs, etc.
  • Instead of removing unsafe from the new_unchecked functions, all uses of the unsafe functions are verified.
  • Indexing and slicing operations using Position switch to unchecked versions, skipping bounds and codepoint boundary checking.

The implications of 2:

  • All of the unsafe functions are turned safe.
  • No more safety docs required. They should be removed because safety docs are for unsafe code.

I would also appreciate clarity on:

  • Whether Pest wants separate strict/checked APIs. Compare: checked_pow vs strict_pow. Strict APIs panic on invalid input, checked APIs return None on invalid input.
  • Whether Pest documents panics in a # Panics section following the standard library pattern. Note that unlike safety docs, panic docs are not required for soundness.

Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API. Note: flat_pairs::new, pair::new, pairs::new, and tokens::new are all internal APIs (checked by enabling the unreachable_pub lint).

@tomtau
Copy link
Contributor

tomtau commented Mar 24, 2024

Thanks @djkoloski , that helps.

Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API.

Yes, I think that option 2 is fine if those remain internal (from a quick look I wasn't sure if those pub methods are reachable from outside).

Whether Pest wants separate strict/checked APIs. Compare: checked_pow vs strict_pow. Strict APIs panic on invalid input, checked APIs return None on invalid input.

Maybe not at this moment, but good to consider for 3.X. Right now, we could separate them for internal API without breaking changes, but it may seem inconsistent with external API.

Whether Pest documents panics in a # Panics section following the standard library pattern. Note that unlike safety docs, panic docs are not required for soundness.

It doesn't, at least not consistently, but it should.

Anyway, I think we can merge this PR and open an issue for documenting panics.

@tomtau tomtau merged commit 9d25248 into pest-parser:master Mar 24, 2024
9 checks passed
@tomtau tomtau mentioned this pull request Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pairs can be made with mismatched input str and Vec<QueueableToken> using pest::state
3 participants