Add [`GreenTreeBuilder::revert`] to support backtracking parsers #67

jyn514 · 2024-10-18T16:22:51Z

Rowan, and hence CSTree, is designed around hand-written parsers. In particular, the APIs for building trees require that each token is recorded only once.

Some parsers, and especially parser combinators, use backtracking instead, where the same token may be seen multiple times. To support this, add a new revert function which discards all tokens seen since the last checkpoint.

see https://rust-lang.zulipchat.com/#narrow/channel/185405-t-compiler.2Frust-analyzer/topic/rewinding.20and.20rowan.3F for more context; i made this PR to CSTree instead of rowan because it actually has a test suite.

Rowan, and hence CSTree, is designed around hand-written parsers. In particular, the APIs for *building* trees require that each token is recorded only once. Some parsers, and especially parser combinators, use backtracking instead, where the same token may be seen multiple times. To support this, add a new `revert` function which discards all tokens seen since the last checkpoint.

domenicquirl · 2024-10-20T18:33:00Z

Hey, thanks for taking the time to PR this. I definitely think that this is a useful thing to add!

While going through the new code, and in particular your (much appreciated) comments, it made me notice that checkpoints don't actually work across node boundaries. This is something that the doc example (which is originally from rowan) suggests you can do, but apparently no one actually needed this and tried to do it - or at least they didn't bother to report it.

I've pushed a slightly different version at https://github.com/domenicquirl/cstree/tree/checkpoint-across-nodes which extends the Checkpoint to track both the parent and the child index. This should allow the checkpoint to work across and wrap nodes (if they are finished before making use of the checkpoint), and also to revert them, for which I added some tests. Your tests all still pass, except that the misuse one now only panics if the time-travelling checkpoint contains at least one node: if there's only tokens, reverting "into the future" will do nothing because there's no tokens to drop (I've split up the test so both cases are covered).

Would you mind taking a look at the most recent commit on my branch and see how it compares? If the "extended" version still addresses your use case (I think so), I'd prefer to land this with the additional checkpoint functionality included. I've changed some nits as well (such as changing the name revert to revert_to to match start_node_at) and I think I have one or two comments to leave here, but feel free to run with those and my changes for your PR - the rest of the code is still yours, and I must say that that assert_tree_eq function is really handy ☺️

domenicquirl · 2024-10-20T18:34:19Z

cstree/src/green/builder.rs

+    /// This is useful for backtracking parsers.
+    ///
+    /// NOTE: this does *not* delete any unfinished nodes; you are responsible for only
+    /// pairing checkpoint/start_node_at. Using `start_node` combined with `revert` has unspecified behavior.


I didn't quite get what you mean to say here about "pairing checkpoint/start_node_at". What exactly is this about?

say you have the following call pattern:

let before = builder.checkpoint(); builder.start_node(); builder.token(EXAMPLE); builder.revert(before);

one might expect, in an ideal world, that this is a noop. but in fact it leaves you with an unpaired start_node() that has no tokens.

i don't remember why i mentioned start_node_at specifically, i can remove that line.

Okay, yeah I get that originally this would only revert the token (I've changed in on the other branch, because the parent-tracking version does now indeed delete the node, which I think is better / more correct and should be covered by the unfinished_node test). I just don't understand what should be "paired" here, perhaps this read like "you should only follow-up a call to checkpoint with a call to start_node_at if you want to use it, without start_node in between"? Either way I think we don't need it anymore if the new behaviour is to delete both node and token :)

domenicquirl · 2024-10-20T18:36:48Z

cstree/tests/it/rollback.rs

+        builder.start_node(SyntaxKind(0));
+        builder.finish_node();


Here, I assume this was intended to contain the actual nested case? Since the first tree is the control and the second only has one checkpoint. I added some code in the other branch, but you might want to double-check if that does what you wanted to test here given that from the name it looks like this should test the case of creating multiple checkpoints back to back.

fascinating, apparently i forgot an entire test here lol. good catch ty, i meant to write this probably

builder.start_node(SyntaxKind(0)); // Add two tokens, then remove both. let initial = builder.checkpoint(); builder.token(SyntaxKind(1), "hi"); builder.start_node(SyntaxKind(3)); builder.token(SyntaxKind(2), "hello"); builder.finish_node(); builder.revert(initial); builder.finish_node();

Oh, so not nesting checkpoints but nesting nodes? Then this would have caught the checkpoint not working across node boundaries, but I think testing using two checkpoints (in the correct order, not like the misuse test) is also something we should do.

domenicquirl · 2024-10-20T18:50:35Z

Not sure what's up with the sanitizers in CI btw. I ran them all locally (with rustc 1.83.0-nightly (8d6b88b16 2024-09-11) instead of rustc 1.84.0-nightly (da935398d 2024-10-19)) and everything passes (against my branch).

jyn514 · 2024-10-21T03:13:59Z

Would you mind taking a look at the most recent commit on my branch and see how it compares? If the "extended" version still addresses your use case (I think so), I'd prefer to land this with the additional checkpoint functionality included.

i think it still meets my use case :) at least, my parser using revert() still passes all the test cases. i left a comment; my brain is kinda fried and i probably won't have another chance to look at this for at least a week, feel free to merge even if i don't get back to you.

domenicquirl · 2024-11-01T13:02:52Z

@jyn514 had some personal stuff come up as well, but spent some more time today going over the assert!s and expanding the test coverage for the different valid and invalid combinations of revert_to and start_node_at, as well as the documentation on those methods. I've landed this in #68, including your original commit, so I'll be closing this PR as done.

The new functionality should be available in v0.12.2, which is already up on crates.io. Thanks again for working on this!

jyn514 · 2024-11-16T20:10:09Z

this is awesome, thanks for reviewing it and fixing the bugs i didn't think about ^^

jyn514 mentioned this pull request Oct 19, 2024

parse some more syntax spreadsheet-lang/spreadsheet#8

Merged

domenicquirl requested changes Oct 20, 2024

View reviewed changes

domenicquirl mentioned this pull request Nov 1, 2024

Support backtracking and Checkpoints across nodes #68

Merged

domenicquirl closed this Nov 1, 2024

jyn514 deleted the rollback branch November 16, 2024 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add [`GreenTreeBuilder::revert`] to support backtracking parsers #67

Add [`GreenTreeBuilder::revert`] to support backtracking parsers #67

jyn514 commented Oct 18, 2024

domenicquirl commented Oct 20, 2024

domenicquirl Oct 20, 2024

jyn514 Oct 21, 2024

domenicquirl Oct 21, 2024 •

edited

Loading

domenicquirl Oct 20, 2024

jyn514 Oct 21, 2024

domenicquirl Oct 21, 2024

domenicquirl commented Oct 20, 2024

jyn514 commented Oct 21, 2024 •

edited

Loading

domenicquirl commented Nov 1, 2024

jyn514 commented Nov 16, 2024

Add [GreenTreeBuilder::revert] to support backtracking parsers #67

Add [GreenTreeBuilder::revert] to support backtracking parsers #67

Conversation

jyn514 commented Oct 18, 2024

domenicquirl commented Oct 20, 2024

domenicquirl Oct 20, 2024

Choose a reason for hiding this comment

jyn514 Oct 21, 2024

Choose a reason for hiding this comment

domenicquirl Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

domenicquirl Oct 20, 2024

Choose a reason for hiding this comment

jyn514 Oct 21, 2024

Choose a reason for hiding this comment

domenicquirl Oct 21, 2024

Choose a reason for hiding this comment

domenicquirl commented Oct 20, 2024

jyn514 commented Oct 21, 2024 • edited Loading

domenicquirl commented Nov 1, 2024

jyn514 commented Nov 16, 2024

Add [`GreenTreeBuilder::revert`] to support backtracking parsers #67

Add [`GreenTreeBuilder::revert`] to support backtracking parsers #67

domenicquirl Oct 21, 2024 •

edited

Loading

jyn514 commented Oct 21, 2024 •

edited

Loading