Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add [GreenTreeBuilder::revert] to support backtracking parsers #67

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions cstree/src/green/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -472,6 +472,56 @@ where
Checkpoint(self.children.len())
}

/// Delete all tokens parsed since the [`Checkpoint`] was created.
///
/// This is useful for backtracking parsers.
///
/// NOTE: this does *not* delete any unfinished nodes; you are responsible for only
/// pairing checkpoint/start_node_at. Using `start_node` combined with `revert` has unspecified behavior.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite get what you mean to say here about "pairing checkpoint/start_node_at". What exactly is this about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say you have the following call pattern:

let before = builder.checkpoint();
builder.start_node();
builder.token(EXAMPLE);
builder.revert(before);

one might expect, in an ideal world, that this is a noop. but in fact it leaves you with an unpaired start_node() that has no tokens.

i don't remember why i mentioned start_node_at specifically, i can remove that line.

Copy link
Owner

@domenicquirl domenicquirl Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, yeah I get that originally this would only revert the token (I've changed in on the other branch, because the parent-tracking version does now indeed delete the node, which I think is better / more correct and should be covered by the unfinished_node test). I just don't understand what should be "paired" here, perhaps this read like "you should only follow-up a call to checkpoint with a call to start_node_at if you want to use it, without start_node in between"? Either way I think we don't need it anymore if the new behaviour is to delete both node and token :)

///
/// NOTE: checkpoints can only be nested "forwards" not backwards. Attempting to go backwards then forwards is
/// unspecified (it will usually panic).
///
/// Example:
/// ```rust
/// # use cstree::testing::*;
/// # use cstree::build::GreenNodeBuilder;
/// # struct Parser;
/// # impl Parser {
/// # fn peek(&self) -> Option<TestSyntaxKind> { None }
/// # fn parse_expr(&mut self) {}
/// # }
/// # let mut builder: GreenNodeBuilder<MySyntax> = GreenNodeBuilder::new();
/// # let mut parser = Parser;
/// let checkpoint = builder.checkpoint();
/// parser.parse_expr();
/// if let Some(Plus) = parser.peek() {
/// // 1 + 2 = Add(1, 2)
/// builder.start_node_at(checkpoint, Operation);
/// parser.parse_expr();
/// builder.finish_node();
/// } else {
/// builder.revert(checkpoint);
/// }
/// ```
pub fn revert(&mut self, checkpoint: Checkpoint) {
let Checkpoint(checkpoint) = checkpoint;
// This doesn't catch scenarios where we've read more tokens since the previous revert,
// but it's close enough.
assert!(
checkpoint <= self.children.len(),
"cannot rollback to a checkpoint in the future"
);
if let Some(&(_, first_child)) = self.parents.last() {
assert!(
checkpoint >= first_child,
"checkpoint no longer valid, was an unmatched start_node_at called?"
);
}

self.children.truncate(checkpoint);
}

/// Wrap the previous branch marked by [`checkpoint`](GreenNodeBuilder::checkpoint) in a new
/// branch and make it current.
#[inline]
Expand Down
35 changes: 34 additions & 1 deletion cstree/tests/it/main.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
mod basic;
mod regressions;
mod rollback;
mod sendsync;
#[cfg(feature = "serialize")]
mod serde;

use cstree::{
build::{GreenNodeBuilder, NodeCache},
green::GreenNode,
interning::Interner,
interning::{Interner, Resolver},
util::NodeOrToken,
RawSyntaxKind, Syntax,
};

Expand Down Expand Up @@ -78,3 +80,34 @@ where
}
from
}

#[track_caller]
pub fn assert_tree_eq(
(left, left_res): (&SyntaxNode, &impl Resolver),
(right, right_res): (&SyntaxNode, &impl Resolver),
) {
if left.green() == right.green() {
return;
}

if left.kind() != right.kind() || left.children_with_tokens().len() != right.children_with_tokens().len() {
panic!("{} !=\n{}", left.debug(left_res, true), right.debug(right_res, true))
}

for elem in left.children_with_tokens().zip(right.children_with_tokens()) {
match elem {
(NodeOrToken::Node(ln), NodeOrToken::Node(rn)) => assert_tree_eq((ln, left_res), (rn, right_res)),
(NodeOrToken::Node(n), NodeOrToken::Token(t)) => {
panic!("{} != {}", n.debug(left_res, true), t.debug(right_res))
}
(NodeOrToken::Token(t), NodeOrToken::Node(n)) => {
panic!("{} != {}", t.debug(left_res), n.debug(right_res, true))
}
(NodeOrToken::Token(lt), NodeOrToken::Token(rt)) => {
if lt.syntax_kind() != rt.syntax_kind() || lt.resolve_text(left_res) != rt.resolve_text(right_res) {
panic!("{} != {}", lt.debug(left_res), rt.debug(right_res))
}
}
}
}
}
117 changes: 117 additions & 0 deletions cstree/tests/it/rollback.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
use super::*;
use cstree::interning::Resolver;

type GreenNodeBuilder<'cache, 'interner> = cstree::build::GreenNodeBuilder<'cache, 'interner, SyntaxKind>;

fn with_builder(f: impl FnOnce(&mut GreenNodeBuilder)) -> (SyntaxNode, impl Resolver) {
let mut builder = GreenNodeBuilder::new();
f(&mut builder);
let (node, cache) = builder.finish();
(SyntaxNode::new_root(node), cache.unwrap().into_interner().unwrap())
}

#[test]
#[should_panic = "`left == right` failed"]
fn comparison_works() {
let (first, res1) = with_builder(|_| {});
let (second, res2) = with_builder(|builder| {
builder.start_node(SyntaxKind(0));
builder.token(SyntaxKind(1), "hi");
builder.finish_node();
});
assert_tree_eq((&first, &res1), (&second, &res2));
}

#[test]
fn simple() {
let (first, res1) = with_builder(|builder| {
builder.start_node(SyntaxKind(0));
builder.finish_node();
});
let (second, res2) = with_builder(|builder| {
builder.start_node(SyntaxKind(0));

// Add a token, then remove it.
let initial = builder.checkpoint();
builder.token(SyntaxKind(1), "hi");
builder.revert(initial);

builder.finish_node();
});
assert_tree_eq((&first, &res1), (&second, &res2));
}

#[test]
fn nested() {
let (first, res1) = with_builder(|builder| {
builder.start_node(SyntaxKind(0));
builder.finish_node();
});

let (second, res2) = with_builder(|builder| {
builder.start_node(SyntaxKind(0));
// Add two tokens, then remove both.
let initial = builder.checkpoint();
builder.token(SyntaxKind(1), "hi");
builder.token(SyntaxKind(2), "hello");
builder.revert(initial);

builder.finish_node();
});

let (third, res3) = with_builder(|builder| {
builder.start_node(SyntaxKind(0));
builder.finish_node();
Comment on lines +63 to +64
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I assume this was intended to contain the actual nested case? Since the first tree is the control and the second only has one checkpoint. I added some code in the other branch, but you might want to double-check if that does what you wanted to test here given that from the name it looks like this should test the case of creating multiple checkpoints back to back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fascinating, apparently i forgot an entire test here lol. good catch ty, i meant to write this probably

        builder.start_node(SyntaxKind(0));
        // Add two tokens, then remove both.
        let initial = builder.checkpoint();
        builder.token(SyntaxKind(1), "hi");
        builder.start_node(SyntaxKind(3));
        builder.token(SyntaxKind(2), "hello");
        builder.finish_node();
        builder.revert(initial);

        builder.finish_node();

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so not nesting checkpoints but nesting nodes? Then this would have caught the checkpoint not working across node boundaries, but I think testing using two checkpoints (in the correct order, not like the misuse test) is also something we should do.

});

assert_tree_eq((&first, &res1), (&second, &res2));
assert_tree_eq((&first, &res1), (&third, &res3));
}

#[test]
#[should_panic = "checkpoint in the future"]
fn misuse() {
with_builder(|builder| {
builder.start_node(SyntaxKind(0));

// Add two tokens, but remove them in the wrong order.
let initial = builder.checkpoint();
builder.token(SyntaxKind(1), "hi");
let new = builder.checkpoint();
builder.token(SyntaxKind(2), "hello");
builder.revert(initial);
builder.revert(new);

builder.finish_node();
});
}

#[test]
fn misuse2() {
let (first, res1) = with_builder(|builder| {
builder.start_node(SyntaxKind(0));
builder.token(SyntaxKind(3), "no");
builder.finish_node();
});

let (second, res2) = with_builder(|builder| {
builder.start_node(SyntaxKind(0));

// Add two tokens, revert to the initial state, add three tokens, and try to revert to an earlier checkpoint.
let initial = builder.checkpoint();
builder.token(SyntaxKind(1), "hi");
let new = builder.checkpoint();
builder.token(SyntaxKind(2), "hello");
builder.revert(initial);

// This is wrong, but there's not a whole lot the library can do about it.
builder.token(SyntaxKind(3), "no");
builder.token(SyntaxKind(4), "bad");
builder.token(SyntaxKind(4), "wrong");
builder.revert(new);

builder.finish_node();
});

assert_tree_eq((&first, &res1), (&second, &res2));
}
Loading