NOTE This text was orginally written by Evan Czaplicki and has since been adopted to typescript.
The goal of this document is to explain how different parsers fit together. When will it backtrack? When will it not?
Say we have keyword "import"
:
String | Result |
---|---|
"import" |
OK{false} |
"imp" |
ERR{true} |
"export" |
ERR{true} |
In our OK{false}
notation, we are indicating:
- Did the parser succeed?
OK
if yes.ERR
if not. - Is it possible to backtrack? So when
keyword
succeeds, backtracking is not allowed anymore. You must continue along that path.
Say we have map func parser
:
parser |
Result |
---|---|
OK{b} |
OK{b} |
ERR{b} |
ERR{b} |
So result of map func parser
is always the same as the result of the parser
itself.
Say we have map2(func)(parserA)(parserB)
:
parserA |
parserB |
Result |
---|---|---|
OK{b} |
OK{b'} |
OK{b && b'} |
OK{b} |
ERR{b'} |
ERR{b && b'} |
ERR{b} |
ERR{b} |
If parserA
succeeds, we try parserB
. If they are both backtrackable, the combined result is backtrackable.
If parserA
fails, that is our result.
This is used to define our pipeline operators like this:
const skip = <A>(p1: Parser<A>) => (p2: Parser<unknown>): Parser<A> {
return map2 ((keep: A, ignore) => keep)(p1)(p2)
}
const take = <A, B>(p1: Parser<(a: A) => B>) => (p2: Parser<A>): Parser<A> {
return map2((keep: A, ignore) => keep)(p1)(p2)
}
Say we have either(parserA)(parserB)
:
parserA |
parserB |
Result |
---|---|---|
OK{b} |
OK{b} |
|
ERR{true} |
OK{b} |
OK{b} |
ERR{true} |
ERR{b} |
ERR{b} |
ERR{false} |
ERR{false} |
The 4th case is very important! If parserA
is not backtrackable, you do not even try parserB
.
The either
function does not appear in the public API, but I used it here because it makes the rules a bit easier to read. In the public API, we have oneOf
instead. You can think of oneOf
as trying either
the head of the list, or oneOf
the parsers in the tail of the list.
Say we have andThen(callback)(parserA)
where callback(a)
produces parserB
:
parserA |
parserB |
Result |
---|---|---|
ERR{b} |
ERR{b} |
|
OK{b} |
OK{b'} |
OK{b && b'} |
OK{b} |
ERR{b'} |
ERR{b && b'} |
If both parts are backtrackable, the overall result is backtrackable.
Say we have backtrackable parser
:
parser |
Result |
---|---|
OK{b} |
OK{true} |
ERR{b} |
ERR{true} |
No matter how parser
was defined, it is backtrackable now. This becomes very interesting when paired with oneOf
. You can have one of the options start with a backtrackable
segment, so even if you do start down that path, you can still try the next parser if something fails. This has important yet subtle implications on performance, so definitely read on!
This parser is intended to give you very precise control over backtracking behavior, and I think that is best explained through examples.
Say we have map2(func)(backtrackable(spaces))(symbol(","))
which can eat a bunch of spaces followed by a comma. Here is how it would work on different strings:
String | Result |
---|---|
" ," |
OK{false} |
" :" |
ERR{true} |
"abc" |
ERR{true} |
Remember how map2
is backtrackable only if both parsers are backtrackable. So in the first case, the overall result is not backtrackable because symbol ","
succeeded.
This becomes useful when paired with either
!
Say we have the following parser
definition:
const backtrackExample = P.oneOf(
P.succeed((x: number) => x)
.skip(P.backtrackable(P.spaces()))
.skip(P.symbol(","))
.skip(P.spaces())
.apply(P.int()),
P.succeed(undefined).skip(P.spaces()).skip(P.symbol("]"))
);
Here is how it would work on different strings:
String | Result |
---|---|
" , 4" |
OK{false} |
" ," |
ERR{false} |
" , a" |
ERR{false} |
" ]" |
OK{false} |
" a" |
ERR{false} |
"abc" |
ERR{true} |
Some of these cases are tricky, so let's look at them in more depth:
" , a"
—backtrackable(spaces)
,symbol(",")
, andspaces
all succeed. At that point we haveOK{false}
. Theint
parser then fails ona
, so we finish withERR{false}
. That meansoneOf
will NOT try the second possibility." ]"
—backtrackable(spaces)
succeeds, butsymbol(",")
fails. At that point we haveERR{true}
, sooneOf
tries the second possibility. After backtracking,spaces
andsymbol("]")
succeed withOK{false}
." a"
—backtrackable(spaces)
succeeds, butsymbol(",")
fails. At that point we haveERR{true}
, sooneOf
tries the second possibility. After backtracking,spaces
succeeds withOK{false}
andsymbol "]"
fails resulting inERR{false}
.
Notice that in the previous example, we parsed spaces
twice in some cases. This is inefficient, especially in large files with lots of whitespace. Backtracking is very inefficient in general though, so if you are interested in performance, it is worthwhile to try to eliminate as many uses of backtrackable
as possible.
So we can rewrite that last example to never backtrack:
const parser: Parser<number | undefind> = succeed(identity)
.skip(spaces)
.apply(
oneOf(
succeed((x: number) => x)
.skip(symbol(","))
.skip(spaces)
.apply(int),
succeed(undefined).skip(symbol("]"))
)
);
Now we are guaranteed to consume the spaces only one time. After that, we decide if we are looking at a ,
or ]
, so we never backtrack and reparse things.
If you are strategic in shuffling parsers around, you can write parsers that do not need backtrackable
at all. The resulting parsers are quite fast. They are essentially the same as LR(k) parsers, but more pleasant to write. I did this in Elm compiler for parsing Elm code, and it was very significantly faster.