FAQs for API Consumers

What are these `rescanFooToken` functions in the scanner for?

The ECMASCript grammar describes lexical goals for its grammar, for which an alternate scanning rule should be used in its place from the default. These rules are triggered when syntactically aware consumers require them (i.e. EMCAScript parsers which know when a construct can occur). For details, see the current ES6 draft.

One example of this is for a single / (the forward-slash token). As long as the / isn't immediately followed by another / (indicating a comment), the default goal (InputElementDiv) is to scan it as a plain / or /= (for division operations); however, in contexts where a bare / or /= would not make sense (such as when parsing a PrimaryExpression), the goal is modified to InputElementRegExp to scan out a regular expression literal.

The rescan functions roughly correspond to triggering the alternate rules, though rather than passing an extra parameter, a rescan is demanded of the scanner.

Though lexical goals are not addressed in the TypeScript spec, there is effectively one new rule added for the >s (greater-than characters) to make generics easy to parse, but impose that any places which might encounter a >>, >>>, etc. require a rescan.

Why does the lexical classifier sometimes give inaccurate results for nested template strings?

For the record, template strings did not even originally have lexical classification support prior to 1.5, for several technical reasons. However, due to demand, support was added for what we believed would be the majority of practical uses.

The lexical classifier is a rudimentary classifier that is intended to be fast and work on single lines at a time. It works through augmenting TypeScript's scanner, which itself works on a regular language grammar except when given a lexical goal in mind. Keep in mind, lexical goals can only be triggered accurately when motivated by a syntactically aware entity, while the lexical classifier only has the context of a single line with a previous line state to work with.

To throw a wrench in the gears, substitution templates are not regular (they are context-free, and if someone wants to write a formal proof of this, a la Rust's string literals not being context-free, feel free to send a pumping lemma pull request for the wiki). The issue is that a } (close curly brace) and the template tail literals (}...${ (TemplateMiddle) and }...${` (TemplateTail)) are need to be distinguished by lexical goals, which are triggered by syntactically aware consumers.

The basic solution is to just maintain a stack. However, in the interest of giving accurate results without complicating the classifier too much, we only keep track of whether a single template expression was unterminated. This means that if you have `...${ `...${ (two TemplateHeads) on a single line, or a multiline object literal in substitution position, your results may not be accurate. In practice this does not happen much though.

Want to contribute to this Wiki?

Fork it and send a pull request.

TypeScript Language Basics

TypeScript Language Advanced

News

TypeScript Contributors

Building Tools for TypeScript

FAQs

FAQs for API Consumers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQs for API Consumers

What are these `rescanFooToken` functions in the scanner for?

Why does the lexical classifier sometimes give inaccurate results for nested template strings?

Want to contribute to this Wiki?

Clone this wiki locally

FAQs for API Consumers

What are these rescanFooToken functions in the scanner for?

Why does the lexical classifier sometimes give inaccurate results for nested template strings?

Want to contribute to this Wiki?

Clone this wiki locally

What are these `rescanFooToken` functions in the scanner for?