Skip to content

Commit

Permalink
Reason V4 [Stacked Diff 2/n #2599] [String Template Literals]
Browse files Browse the repository at this point in the history
Summary:This diff implements string template literals.

Test Plan:

Reviewers:

CC:
  • Loading branch information
jordwalke committed Aug 6, 2020
1 parent dab3565 commit 5ee062e
Show file tree
Hide file tree
Showing 17 changed files with 978 additions and 49 deletions.
146 changes: 146 additions & 0 deletions docs/TEMPLATE_LITERALS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@

Contributors: Lexing and Parsing String Templates:
===================================================
Supporting string templates requires coordination between the lexer, parser and
printer. The lexer (as always) creates a token stream, but when it encounters a
backtick, it begins a special parsing mode that collects the (mostly) raw text,
until either hitting a closing backtick, or a `${`. If it encounters the `${`
(called an "interpolation region"), it will temporarily resume the "regular"
lexing approach, instead of collecting the raw text - until it hits a balanced
`}`, upon which it will enter the "raw text" mode again until it hits the
closing backtick.

- Parsing of raw text regions and regular tokenizing: Handled by
`reason_declarative_lexer.ml`.
- Token balancing: Handled by `reason_lexer.ml`.

The output of lexing becomes tokens streamed into the parser, and the parser
`reason_parser.mly` turns those tokens into AST expressions.

## Lexing:

String templates are opened by:
- A backtick.
- Followed by any whitespace character (newline, or space/tab).

- Any whitespace character (newline, or space/tab).
- Followed by a backtick

```reason
let x = ` hi this is my string template `
let x = `
The newline counts as a whitespace character both for opening and closing.
`
```

Within the string template literal, there may be regions of non-string
"interpolation" where expressions are lexed/parsed.

```reason
let x = ` hi this is my ${expressionHere() ++ "!"} template `
```

Template strings are lexed into tokens, some of those tokens contain a string
"payload" with portions of the string content.
The opening backtick, closing backtick, and `${` characters do not become a
token that is fed to the parser, and are not included in the text payload of
any token. The Right Brace `}` closing an interpolation region `${` _does_
become a token that is fed to the parser. There are three tokens that are
produced when lexing string templates.

- `STRING_TEMPLATE_TERMINATED(string)`: A string region that is terminated with
closing backtick. It may be the entire string template contents if there are
no interpolation regions `${}`, or it may be the final string segment after
an interpolation region `${}`, as long as it is the closing of the entire
template.
- `STRING_TEMPLATE_SEGMENT_LBRACE(string)`: A string region occuring _before_
an interpolation region `${`. The `string` payload of this token is the
contents up until (but not including) the next `${`.
- `RBRACE`: A `}` character that terminates an interpolation region that
started with `${`.

Simple example:

STRING_TEMPLATE_TERMINATED
| |
` lorem ipsum lorem ipsum bla `
^ ^
| |
| The closing backtick also doesn't show up in the token
| stream, but the last white space is part of the lexed
| STRING_TEMPLATE_TERMINATED token
| (it is used to compute indentation, but is stripped from
| the string constant, or re-inserted in refmting if not present)
|
The backtick doesn't show up anywhere in the token stream. The first
single white space after backtick is also not part of the lexed tokens.

Multiline example:

All of this leading line whitespace remains parts of the tokens' payloads
but it is is normalized and stripped when the parser converts the tokens
into string expressions.
|
| This newline not part of any token
| |
| v
| `
+-> lorem ipsum lorem
ipsum bla
`
^
|
All of this white space on final line is part of the token as well.


For interpolation, the token `STRING_TEMPLATE_SEGMENT_LBRACE` represents the
string contents (minus any single/first white space after backtick), up to the
`${`. As with non-interpolated string templates, the opening and closing
backtick does not show up in the token stream, the first white space character
after opening backtick is not included in the lexed string contents, the final
white space character before closing backtick *is* part of the lexed string
token (to compute indentation), but that final white space character, along
with leading line whitespace is stripped from the string expression when the
parsing stage converts from lexed tokens to AST string expressions.

` lorem ipsum lorem ipsum bla${expression}lorem ipsum lorem ip lorem`
| | || |
STRING_TEMPLATE_TERMINATED |STRING_TEMPLATE_TERMINATED
RBRACE
## Parsing:

The string template tokens are turned into normal AST expressions.
`STRING_TEMPLATE_SEGMENT_LBRACE` and `STRING_TEMPLATE_TERMINATED` lexed tokens
contains all of the string contents, plus leading line whitespace for each
line, including the final whitespace before the closing backtick. These are
normalized in the parser by stripping that leading whitespace including two
additional spaces for nice indentation, before turning them into some
combination of string contants with a special attribute on the AST, or string
concats with a special attribute on the concat AST node.

```reason
// This:
let x = `
Hello there
`;
// Becomes:
let x = [@reason.template] "Hello there";
// This:
let x = `
${expr} Hello there
`;
// Becomes:
let x = [@reason.template] (expr ++ [@reason.template] "Hello there");
```

User Documentation:
===================
> This section is the user documentation for string template literals, which
> will be published to the [official Reason Syntax
> documentation](https://reasonml.github.io/) when
TODO
190 changes: 190 additions & 0 deletions formatTest/typeCheckedTests/expected_output/templateStrings.re
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
[@reason.version 3.7];
/**
* Comments:
*/

let addTwo = (a, b) => string_of_int(a + b);
let singleLineConstant = `
Single line template
`;
let singleLineInterpolate = `
Single line ${addTwo(1, 2)}!
`;

let multiLineConstant = `
Multi line template
Multi %a{x, y}line template
Multi line template
Multi line template
`;

let printTwo = (a, b) => {
print_string(a);
print_string(b);
};

let templteWithAttribute =
[@attrHere]
`
Passing line template
Passing line template
Passing line template
Passing line template
`;

let result =
print_string(
`
Passing line template
Passing line template
Passing line template
Passing line template
`,
);

let resultPrintTwo =
printTwo(
"short one",
`
Passing line template
Passing line template
Passing line template
Passing line template
`,
);

let hasBackSlashes = `
One not escaped: \
Three not escaped: \ \ \
Two not escaped: \\
Two not escaped: \\\
One not escaped slash, and one escaped tick: \\`
Two not escaped slashes, and one escaped tick: \\\`
Two not escaped slashes, and one escaped dollar-brace: \\\${
One not escaped slash, then a close tick: \
`;

let singleLineInterpolateWithEscapeTick = `
Single \`line ${addTwo(1, 2)}!
`;

let singleLineConstantWithEscapeDollar = `
Single \${line template
`;

// The backslash here is a backslash literal.
let singleLineInterpolateWithBackslashThenDollar = `
Single \$line ${addTwo(2, 3)}!
`;

let beforeExpressionCommentInNonLetty = `
Before expression comment in non-letty interpolation:
${/* Comment */ string_of_int(1 + 2)}
`;

let beforeExpressionCommentInNonLetty2 = `
Same thing but with comment on own line:
${
/* Comment */
string_of_int(10 + 8)
}
`;
module StringIndentationWorksInModuleIndentation = {
let beforeExpressionCommentInNonLetty2 = `
Same thing but with comment on own line:
${
/* Comment */
string_of_int(10 + 8)
}
`;
};

let beforeExpressionCommentInNonLetty3 = `
Same thing but with text after final brace on same line:
${
/* Comment */
string_of_int(20 + 1000)
}TextAfterBrace
`;

let beforeExpressionCommentInNonLetty3 = `
Same thing but with text after final brace on next line:
${
/* Comment */
string_of_int(100)
}
TextAfterBrace
`;

let x = 0;
let commentInLetSequence = `
Comment in letty interpolation:
${
/* Comment */
let x = 200 + 49;
string_of_int(x);
}
`;

let commentInLetSequence2 = `
Same but with text after final brace on same line:
${
/* Comment */
let x = 200 + 49;
string_of_int(x);
}TextAfterBrace
`;

let commentInLetSequence3 = `
Same but with text after final brace on next line:
${
/* Comment */
let x = 200 + 49;
string_of_int(x);
}
TextAfterBrace
`;

let reallyCompicatedNested = `
Comment in non-letty interpolation:

${
/* Comment on first line of interpolation region */

let y = (a, b) => a + b;
let x = 0 + y(0, 2);
// Nested string templates
let s = `
asdf${addTwo(0, 0)}
alskdjflakdsjf
`;
s ++ s;
}same line as brace with one space
and some more text at the footer no newline
`;

let reallyLongIdent = "!";
let backToBackInterpolations = `
Two interpolations side by side:
${addTwo(0, 0)}${addTwo(0, 0)}
Two interpolations side by side with leading and trailing:
Before${addTwo(0, 0)}${addTwo(0, 0)}After

Two interpolations side by side second one should break:
Before${addTwo(0, 0)}${
reallyLongIdent
++ reallyLongIdent
++ reallyLongIdent
++ reallyLongIdent
}After

Three interpolations side by side:
Before${addTwo(0, 0)}${
reallyLongIdent
++ reallyLongIdent
++ reallyLongIdent
++ reallyLongIdent
}${
""
}After
`;
Loading

0 comments on commit 5ee062e

Please sign in to comment.