-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add [GreenTreeBuilder::revert
] to support backtracking parsers
#67
Conversation
Rowan, and hence CSTree, is designed around hand-written parsers. In particular, the APIs for *building* trees require that each token is recorded only once. Some parsers, and especially parser combinators, use backtracking instead, where the same token may be seen multiple times. To support this, add a new `revert` function which discards all tokens seen since the last checkpoint.
Hey, thanks for taking the time to PR this. I definitely think that this is a useful thing to add! While going through the new code, and in particular your (much appreciated) comments, it made me notice that checkpoints don't actually work across node boundaries. This is something that the doc example (which is originally from I've pushed a slightly different version at https://github.com/domenicquirl/cstree/tree/checkpoint-across-nodes which extends the Would you mind taking a look at the most recent commit on my branch and see how it compares? If the "extended" version still addresses your use case (I think so), I'd prefer to land this with the additional checkpoint functionality included. I've changed some nits as well (such as changing the name |
/// This is useful for backtracking parsers. | ||
/// | ||
/// NOTE: this does *not* delete any unfinished nodes; you are responsible for only | ||
/// pairing checkpoint/start_node_at. Using `start_node` combined with `revert` has unspecified behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite get what you mean to say here about "pairing checkpoint/start_node_at". What exactly is this about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say you have the following call pattern:
let before = builder.checkpoint();
builder.start_node();
builder.token(EXAMPLE);
builder.revert(before);
one might expect, in an ideal world, that this is a noop. but in fact it leaves you with an unpaired start_node()
that has no tokens.
i don't remember why i mentioned start_node_at
specifically, i can remove that line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, yeah I get that originally this would only revert the token (I've changed in on the other branch, because the parent-tracking version does now indeed delete the node, which I think is better / more correct and should be covered by the unfinished_node
test). I just don't understand what should be "paired" here, perhaps this read like "you should only follow-up a call to checkpoint
with a call to start_node_at
if you want to use it, without start_node
in between"? Either way I think we don't need it anymore if the new behaviour is to delete both node and token :)
builder.start_node(SyntaxKind(0)); | ||
builder.finish_node(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I assume this was intended to contain the actual nested case? Since the first tree is the control and the second only has one checkpoint. I added some code in the other branch, but you might want to double-check if that does what you wanted to test here given that from the name it looks like this should test the case of creating multiple checkpoints back to back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fascinating, apparently i forgot an entire test here lol. good catch ty, i meant to write this probably
builder.start_node(SyntaxKind(0));
// Add two tokens, then remove both.
let initial = builder.checkpoint();
builder.token(SyntaxKind(1), "hi");
builder.start_node(SyntaxKind(3));
builder.token(SyntaxKind(2), "hello");
builder.finish_node();
builder.revert(initial);
builder.finish_node();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so not nesting checkpoints but nesting nodes? Then this would have caught the checkpoint not working across node boundaries, but I think testing using two checkpoints (in the correct order, not like the misuse
test) is also something we should do.
Not sure what's up with the sanitizers in CI btw. I ran them all locally (with |
i think it still meets my use case :) at least, my parser using |
@jyn514 had some personal stuff come up as well, but spent some more time today going over the The new functionality should be available in |
this is awesome, thanks for reviewing it and fixing the bugs i didn't think about ^^ |
Rowan, and hence CSTree, is designed around hand-written parsers. In particular, the APIs for building trees require that each token is recorded only once.
Some parsers, and especially parser combinators, use backtracking instead, where the same token may be seen multiple times. To support this, add a new
revert
function which discards all tokens seen since the last checkpoint.see https://rust-lang.zulipchat.com/#narrow/channel/185405-t-compiler.2Frust-analyzer/topic/rewinding.20and.20rowan.3F for more context; i made this PR to CSTree instead of rowan because it actually has a test suite.