-
Notifications
You must be signed in to change notification settings - Fork 4
/
spec.emu
1108 lines (1016 loc) · 62.5 KB
/
spec.emu
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!doctype html>
<meta charset="utf8">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
<link rel="spec" href="es2015" />
<pre class="metadata">
title: Regular Expression Pattern Modifiers for ECMAScript
stage: 3
contributors: Ron Buckton, Ecma International
</pre>
<emu-biblio href="node_modules/@tc39/ecma262-biblio/biblio.json"></emu-biblio>
<emu-intro id="sec-intro">
<h1>Introduction</h1>
<p>See <a href="https://github.com/tc39/proposal-regexp-modifiers#readme">the proposal repository</a> for background material and discussion.</p>
</emu-intro>
<emu-clause id="sec-text-processing">
<h1>Text Processing</h1>
<emu-clause id="sec-regexp-regular-expression-objects">
<h1>RegExp (Regular Expression) Objects</h1>
<p>A RegExp object contains a regular expression and the associated flags.</p>
<emu-note>
<p>The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.</p>
</emu-note>
<emu-clause id="sec-patterns">
<h1>Patterns</h1>
<p>The RegExp constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of |Pattern|.</p>
<h2>Syntax</h2>
<emu-grammar type="definition">
Pattern[UnicodeMode, N] ::
Disjunction[?UnicodeMode, ?N]
Disjunction[UnicodeMode, N] ::
Alternative[?UnicodeMode, ?N]
Alternative[?UnicodeMode, ?N] `|` Disjunction[?UnicodeMode, ?N]
Alternative[UnicodeMode, N] ::
[empty]
Alternative[?UnicodeMode, ?N] Term[?UnicodeMode, ?N]
Term[UnicodeMode, N] ::
Assertion[?UnicodeMode, ?N]
Atom[?UnicodeMode, ?N]
Atom[?UnicodeMode, ?N] Quantifier
Assertion[UnicodeMode, N] ::
`^`
`$`
`\` `b`
`\` `B`
`(` `?` `=` Disjunction[?UnicodeMode, ?N] `)`
`(` `?` `!` Disjunction[?UnicodeMode, ?N] `)`
`(` `?` `<=` Disjunction[?UnicodeMode, ?N] `)`
`(` `?` `<!` Disjunction[?UnicodeMode, ?N] `)`
Quantifier ::
QuantifierPrefix
QuantifierPrefix `?`
QuantifierPrefix ::
`*`
`+`
`?`
`{` DecimalDigits[~Sep] `}`
`{` DecimalDigits[~Sep] `,` `}`
`{` DecimalDigits[~Sep] `,` DecimalDigits[~Sep] `}`
Atom[UnicodeMode, N] ::
PatternCharacter
`.`
`\` AtomEscape[?UnicodeMode, ?N]
CharacterClass[?UnicodeMode]
`(` GroupSpecifier[?UnicodeMode] Disjunction[?UnicodeMode, ?N] `)`
<del>`(` `?` `:` Disjunction[?UnicodeMode, ?N] `)`</del>
<ins>`(` `?` RegularExpressionFlags `:` Disjunction[?UnicodeMode, ?N] `)`</ins>
<ins>`(` `?` RegularExpressionFlags `-` RegularExpressionFlags `:` Disjunction[?UnicodeMode, ?N] `)`</ins>
SyntaxCharacter :: one of
`^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `]` `{` `}` `|`
PatternCharacter ::
SourceCharacter but not SyntaxCharacter
AtomEscape[UnicodeMode, N] ::
DecimalEscape
CharacterClassEscape[?UnicodeMode]
CharacterEscape[?UnicodeMode]
[+N] `k` GroupName[?UnicodeMode]
CharacterEscape[UnicodeMode] ::
ControlEscape
`c` ControlLetter
`0` [lookahead ∉ DecimalDigit]
HexEscapeSequence
RegExpUnicodeEscapeSequence[?UnicodeMode]
IdentityEscape[?UnicodeMode]
ControlEscape :: one of
`f` `n` `r` `t` `v`
ControlLetter :: one of
`a` `b` `c` `d` `e` `f` `g` `h` `i` `j` `k` `l` `m` `n` `o` `p` `q` `r` `s` `t` `u` `v` `w` `x` `y` `z`
`A` `B` `C` `D` `E` `F` `G` `H` `I` `J` `K` `L` `M` `N` `O` `P` `Q` `R` `S` `T` `U` `V` `W` `X` `Y` `Z`
GroupSpecifier[UnicodeMode] ::
[empty]
`?` GroupName[?UnicodeMode]
GroupName[UnicodeMode] ::
`<` RegExpIdentifierName[?UnicodeMode] `>`
RegExpIdentifierName[UnicodeMode] ::
RegExpIdentifierStart[?UnicodeMode]
RegExpIdentifierName[?UnicodeMode] RegExpIdentifierPart[?UnicodeMode]
RegExpIdentifierStart[UnicodeMode] ::
IdentifierStartChar
`\` RegExpUnicodeEscapeSequence[+UnicodeMode]
[~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate
RegExpIdentifierPart[UnicodeMode] ::
IdentifierPartChar
`\` RegExpUnicodeEscapeSequence[+UnicodeMode]
[~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate
RegExpUnicodeEscapeSequence[UnicodeMode] ::
[+UnicodeMode] `u` HexLeadSurrogate `\u` HexTrailSurrogate
[+UnicodeMode] `u` HexLeadSurrogate
[+UnicodeMode] `u` HexTrailSurrogate
[+UnicodeMode] `u` HexNonSurrogate
[~UnicodeMode] `u` Hex4Digits
[+UnicodeMode] `u{` CodePoint `}`
UnicodeLeadSurrogate ::
> any Unicode code point in the inclusive range 0xD800 to 0xDBFF
UnicodeTrailSurrogate ::
> any Unicode code point in the inclusive range 0xDC00 to 0xDFFF
</emu-grammar>
<p>Each `\\u` |HexTrailSurrogate| for which the choice of associated `u` |HexLeadSurrogate| is ambiguous shall be associated with the nearest possible `u` |HexLeadSurrogate| that would otherwise have no corresponding `\\u` |HexTrailSurrogate|.</p>
<emu-grammar type="definition">
HexLeadSurrogate ::
Hex4Digits [> but only if the MV of |Hex4Digits| is in the inclusive range 0xD800 to 0xDBFF]
HexTrailSurrogate ::
Hex4Digits [> but only if the MV of |Hex4Digits| is in the inclusive range 0xDC00 to 0xDFFF]
HexNonSurrogate ::
Hex4Digits [> but only if the MV of |Hex4Digits| is not in the inclusive range 0xD800 to 0xDFFF]
IdentityEscape[UnicodeMode] ::
[+UnicodeMode] SyntaxCharacter
[+UnicodeMode] `/`
[~UnicodeMode] SourceCharacter but not UnicodeIDContinue
DecimalEscape ::
NonZeroDigit DecimalDigits[~Sep]? [lookahead ∉ DecimalDigit]
CharacterClassEscape[UnicodeMode] ::
`d`
`D`
`s`
`S`
`w`
`W`
[+UnicodeMode] `p{` UnicodePropertyValueExpression `}`
[+UnicodeMode] `P{` UnicodePropertyValueExpression `}`
UnicodePropertyValueExpression ::
UnicodePropertyName `=` UnicodePropertyValue
LoneUnicodePropertyNameOrValue
UnicodePropertyName ::
UnicodePropertyNameCharacters
UnicodePropertyNameCharacters ::
UnicodePropertyNameCharacter UnicodePropertyNameCharacters?
UnicodePropertyValue ::
UnicodePropertyValueCharacters
LoneUnicodePropertyNameOrValue ::
UnicodePropertyValueCharacters
UnicodePropertyValueCharacters ::
UnicodePropertyValueCharacter UnicodePropertyValueCharacters?
UnicodePropertyValueCharacter ::
UnicodePropertyNameCharacter
DecimalDigit
UnicodePropertyNameCharacter ::
ControlLetter
`_`
CharacterClass[UnicodeMode] ::
`[` [lookahead != `^`] ClassRanges[?UnicodeMode] `]`
`[` `^` ClassRanges[?UnicodeMode] `]`
ClassRanges[UnicodeMode] ::
[empty]
NonemptyClassRanges[?UnicodeMode]
NonemptyClassRanges[UnicodeMode] ::
ClassAtom[?UnicodeMode]
ClassAtom[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode]
ClassAtom[?UnicodeMode] `-` ClassAtom[?UnicodeMode] ClassRanges[?UnicodeMode]
NonemptyClassRangesNoDash[UnicodeMode] ::
ClassAtom[?UnicodeMode]
ClassAtomNoDash[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode]
ClassAtomNoDash[?UnicodeMode] `-` ClassAtom[?UnicodeMode] ClassRanges[?UnicodeMode]
ClassAtom[UnicodeMode] ::
`-`
ClassAtomNoDash[?UnicodeMode]
ClassAtomNoDash[UnicodeMode] ::
SourceCharacter but not one of `\` or `]` or `-`
`\` ClassEscape[?UnicodeMode]
ClassEscape[UnicodeMode] ::
`b`
[+UnicodeMode] `-`
CharacterClassEscape[?UnicodeMode]
CharacterEscape[?UnicodeMode]
</emu-grammar>
<emu-note>
<p>A number of productions in this section are given alternative definitions in section <emu-xref href="#sec-regular-expressions-patterns"></emu-xref>.</p>
</emu-note>
</emu-clause>
<emu-clause id="sec-pattern-semantics">
<h1>Pattern Semantics</h1>
<emu-clause id="sec-notation">
<h1>Notation</h1>
<p>The descriptions below use the following aliases:</p>
<ul>
<li>
_Input_ is a List whose elements are the characters of the String being matched by the regular expression pattern. Each character is either a code unit or a code point, depending upon the kind of pattern involved. The notation _Input_[_n_] means the _n_<sup>th</sup> character of _Input_, where _n_ can range between 0 (inclusive) and _InputLength_ (exclusive).
</li>
<li>
_InputLength_ is the number of characters in _Input_.
</li>
<li>
_NcapturingParens_ is the total number of left-capturing parentheses (i.e. the total number of <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> Parse Nodes) in the pattern. A left-capturing parenthesis is any `(` pattern character that is matched by the `(` terminal of the <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> production.
</li>
<li>
_DotAll_ is *true* if the RegExp object's [[OriginalFlags]] internal slot contains *"s"* and otherwise is *false*.
</li>
<li>
_IgnoreCase_ is *true* if the RegExp object's [[OriginalFlags]] internal slot contains *"i"* and otherwise is *false*.
</li>
<li>
_Multiline_ is *true* if the RegExp object's [[OriginalFlags]] internal slot contains *"m"* and otherwise is *false*.
</li>
<li>
_Unicode_ is *true* if the RegExp object's [[OriginalFlags]] internal slot contains *"u"* and otherwise is *false*.
</li>
<li oldids="sec-runtime-semantics-wordcharacters-abstract-operation">
<del>_WordCharacters_ is the mathematical set that is the union of all sixty-three characters in *"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_"* (letters, numbers, and U+005F (LOW LINE) in the Unicode Basic Latin block) and all characters _c_ for which _c_ is not in that set but Canonicalize(_c_) is. _WordCharacters_ cannot contain more than sixty-three characters unless _Unicode_ and _IgnoreCase_ are both *true*.</del>
</li>
</ul>
<p>Furthermore, the descriptions below use the following internal data structures:</p>
<ul>
<li>
A <em>CharSet</em> is a mathematical set of characters. When the _Unicode_ flag is *true*, “all characters” means the CharSet containing all code point values; otherwise “all characters” means the CharSet containing all code unit values.
</li>
<li>
A <em>State</em> is an ordered pair (_endIndex_, _captures_) where _endIndex_ is an integer and _captures_ is a List of _NcapturingParens_ values. States are used to represent partial match states in the regular expression matching algorithms. The _endIndex_ is one plus the index of the last input character matched so far by the pattern, while _captures_ holds the results of capturing parentheses. The _n_<sup>th</sup> element of _captures_ is either a List of characters that represents the value obtained by the _n_<sup>th</sup> set of capturing parentheses or *undefined* if the _n_<sup>th</sup> set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process.
</li>
<li>
A <em>MatchResult</em> is either a State or the special token ~failure~ that indicates that the match failed.
</li>
<li>
A <em>Continuation</em> is an Abstract Closure that takes one State argument and returns a MatchResult result. The Continuation attempts to match the remaining portion (specified by the closure's captured values) of the pattern against _Input_, starting at the intermediate state given by its State argument. If the match succeeds, the Continuation returns the final State that it reached; if the match fails, the Continuation returns ~failure~.
</li>
<li>
A <em>Matcher</em> is an Abstract Closure that takes two arguments—a State and a Continuation—and returns a MatchResult result. A Matcher attempts to match a middle subpattern (specified by the closure's captured values) of the pattern against _Input_, starting at the intermediate state given by its State argument. The Continuation argument should be a closure that matches the rest of the pattern. After matching the subpattern of a pattern to obtain a new State, the Matcher then calls Continuation on that new State to test if the rest of the pattern can match as well. If it can, the Matcher returns the State returned by Continuation; if not, the Matcher may try different choices at its choice points, repeatedly calling Continuation until it either succeeds or all possibilities have been exhausted.
</li>
</ul>
</emu-clause>
<ins class="block">
<emu-clause id="sec-patterns-static-semantics-early-errors">
<h1>Static Semantics: Early Errors</h1>
<emu-grammar>Atom :: `(` `?` RegularExpressionFlags `:` Disjunction `)`</emu-grammar>
<ul>
<li>It is a Syntax Error if the source text matched by |RegularExpressionFlags| contains any code point other than `i`, `m`, or `s`, or if it contains the same code point more than once.
</ul>
<emu-grammar>Atom :: `(` `?` RegularExpressionFlags `-` RegularExpressionFlags `:` Disjunction `)`</emu-grammar>
<ul>
<li>It is a Syntax Error if the source text matched by the first |RegularExpressionFlags| and the source text matched by the second |RegularExpressionFlags| are both empty.
<li>It is a Syntax Error if the source text matched by the first |RegularExpressionFlags| contains any code point other than `i`, `m`, or `s`, or contains the same code point more than once.
<li>It is a Syntax Error if the source text matched by the second |RegularExpressionFlags| contains any code point other than `i`, `m`, or `s`, or contains the same code point more than once.
<li>It is a Syntax Error if any code point in the source text matched by the first |RegularExpressionFlags| is also contained in the source text matched by the second |RegularExpressionFlags|.
</ul>
</emu-clause>
<emu-clause id="sec-modifiers-records">
<h1>Modifiers Records</h1>
<p>A <dfn variants="Modifiers Records">Modifiers Record</dfn> is a Record value used to encapsulate information about the regular expression flags that apply to a subpattern.</p>
<p>Modifiers Records have the fields listed in <emu-xref href="#table-modifiers-record"></emu-xref>.</p>
<emu-table id="table-modifiers-record" caption="Modifiers Record Fields">
<table>
<tr>
<th>Field Name</th>
<th>Value</th>
<th>Meaning</th>
</tr>
<tr>
<td>[[DotAll]]</td>
<td>a Boolean</td>
<td>Indicates whether the *"s"* flag is currently enabled.</td>
</tr>
<tr>
<td>[[IgnoreCase]]</td>
<td>a Boolean</td>
<td>Indicates whether the *"i"* flag is currently enabled.</td>
</tr>
<tr>
<td>[[Multiline]]</td>
<td>a Boolean</td>
<td>Indicates whether the *"m"* flag is currently enabled.</td>
</tr>
</table>
</emu-table>
</emu-clause>
</ins>
<emu-clause id="sec-compilepattern" type="sdo" oldids="sec-pattern">
<h1>Runtime Semantics: CompilePattern</h1>
<dl class="header">
<dt>description</dt>
<dd>It returns an Abstract Closure that takes a String and a non-negative integer and returns a MatchResult.</dd>
</dl>
<emu-grammar>Pattern :: Disjunction</emu-grammar>
<emu-alg>
1. <ins>Let _modifiers_ be the Modifiers Record { [[DotAll]]: _DotAll_, [[IgnoreCase]]: _IgnoreCase_, [[Multiline]]: _Multiline_ }.</ins>
1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> ~forward~<ins> and _modifiers_</ins>.
1. Return a new Abstract Closure with parameters (_str_, _index_) that captures _m_ and performs the following steps when called:
1. Assert: Type(_str_) is String.
1. Assert: _index_ is a non-negative integer which is ≤ the length of _str_.
1. If _Unicode_ is *true*, let _Input_ be StringToCodePoints(_str_). Otherwise, let _Input_ be a List whose elements are the code units that are the elements of _str_. _Input_ will be used throughout the algorithms in <emu-xref href="#sec-pattern-semantics"></emu-xref>. Each element of _Input_ is considered to be a character.
1. Let _InputLength_ be the number of characters contained in _Input_. This alias will be used throughout the algorithms in <emu-xref href="#sec-pattern-semantics"></emu-xref>.
1. Let _listIndex_ be the index into _Input_ of the character that was obtained from element _index_ of _str_.
1. Let _c_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called:
1. Assert: _y_ is a State.
1. Return _y_.
1. Let _cap_ be a List of _NcapturingParens_ *undefined* values, indexed 1 through _NcapturingParens_.
1. Let _x_ be the State (_listIndex_, _cap_).
1. Return _m_(_x_, _c_).
</emu-alg>
<emu-note>
<p>A Pattern compiles to an Abstract Closure value. RegExpBuiltinExec can then apply this procedure to a String and an offset within the String to determine whether the pattern would match starting at exactly that offset within the String, and, if it does match, what the values of the capturing parentheses would be. The algorithms in <emu-xref href="#sec-pattern-semantics"></emu-xref> are designed so that compiling a pattern may throw a *SyntaxError* exception; on the other hand, once the pattern is successfully compiled, applying the resulting Abstract Closure to find a match in a String cannot throw an exception (except for any implementation-defined exceptions that can occur anywhere such as out-of-memory).</p>
</emu-note>
</emu-clause>
<emu-clause id="sec-compilesubpattern" type="sdo" oldids="sec-disjunction,sec-alternative,sec-term">
<h1>
Runtime Semantics: CompileSubpattern (
_direction_: ~forward~ or ~backward~,
<ins>_modifiers_: a Modifiers Record,</ins>
): a Matcher
</h1>
<dl class="header">
</dl>
<emu-note>
<p>This section is amended in B.1.2.4.</p>
</emu-note>
<!-- Disjunction -->
<emu-grammar>Disjunction :: Alternative `|` Disjunction</emu-grammar>
<emu-alg>
1. Let _m1_ be CompileSubpattern of |Alternative| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>.
1. Let _m2_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>.
1. Return a new Matcher with parameters (_x_, _c_) that captures _m1_ and _m2_ and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _r_ be _m1_(_x_, _c_).
1. If _r_ is not ~failure~, return _r_.
1. Return _m2_(_x_, _c_).
</emu-alg>
<emu-note>
<p>The `|` regular expression operator separates two alternatives. The pattern first tries to match the left |Alternative| (followed by the sequel of the regular expression); if it fails, it tries to match the right |Disjunction| (followed by the sequel of the regular expression). If the left |Alternative|, the right |Disjunction|, and the sequel all have choice points, all choices in the sequel are tried before moving on to the next choice in the left |Alternative|. If choices in the left |Alternative| are exhausted, the right |Disjunction| is tried instead of the left |Alternative|. Any capturing parentheses inside a portion of the pattern skipped by `|` produce *undefined* values instead of Strings. Thus, for example,</p>
<pre><code class="javascript">/a|ab/.exec("abc")</code></pre>
<p>returns the result *"a"* and not *"ab"*. Moreover,</p>
<pre><code class="javascript">/((a)|(ab))((c)|(bc))/.exec("abc")</code></pre>
<p>returns the array</p>
<pre><code class="javascript">["abc", "a", "a", undefined, "bc", undefined, "bc"]</code></pre>
<p>and not</p>
<pre><code class="javascript">["abc", "ab", undefined, "ab", "c", "c", undefined]</code></pre>
<p>The order in which the two alternatives are tried is independent of the value of _direction_.</p>
</emu-note>
<!-- Alternative -->
<emu-grammar>Alternative :: [empty]</emu-grammar>
<emu-alg>
1. Return a new Matcher with parameters (_x_, _c_) that captures nothing and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Return _c_(_x_).
</emu-alg>
<emu-grammar>Alternative :: Alternative Term</emu-grammar>
<emu-alg>
1. Let _m1_ be CompileSubpattern of |Alternative| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>.
1. Let _m2_ be CompileSubpattern of |Term| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>.
1. If _direction_ is ~forward~, then
1. Let _m_ be a new Matcher with parameters (_x_, _c_) that captures _m1_ and _m2_ and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _d_ be a new Continuation with parameters (_y_) that captures _c_ and _m2_ and performs the following steps when called:
1. Assert: _y_ is a State.
1. Return _m2_(_y_, _c_).
1. Return _m1_(_x_, _d_).
1. Else,
1. Assert: _direction_ is ~backward~.
1. Let _m_ be a new Matcher with parameters (_x_, _c_) that captures _m1_ and _m2_ and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _d_ be a new Continuation with parameters (_y_) that captures _c_ and _m1_ and performs the following steps when called:
1. Assert: _y_ is a State.
1. Return _m1_(_y_, _c_).
1. Return _m2_(_x_, _d_).
</emu-alg>
<emu-note>
<p>Consecutive |Term|s try to simultaneously match consecutive portions of _Input_. When _direction_ is ~forward~, if the left |Alternative|, the right |Term|, and the sequel of the regular expression all have choice points, all choices in the sequel are tried before moving on to the next choice in the right |Term|, and all choices in the right |Term| are tried before moving on to the next choice in the left |Alternative|. When _direction_ is ~backward~, the evaluation order of |Alternative| and |Term| are reversed.</p>
</emu-note>
<!-- Term -->
<emu-grammar>Term :: Assertion</emu-grammar>
<emu-alg>
1. Return CompileAssertion of |Assertion|<ins> with argument _modifiers_</ins>.
</emu-alg>
<emu-note>
<p>The resulting Matcher is independent of _direction_.</p>
</emu-note>
<emu-grammar>Term :: Atom</emu-grammar>
<emu-alg>
1. Return CompileAtom of |Atom| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>.
</emu-alg>
<emu-grammar>Term :: Atom Quantifier</emu-grammar>
<emu-alg>
1. Let _m_ be CompileAtom of |Atom| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>.
1. Let _q_ be CompileQuantifier of |Quantifier|.
1. Assert: _q_.[[Min]] ≤ _q_.[[Max]].
1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of this |Term|. This is the total number of <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> Parse Nodes prior to or enclosing this |Term|.
1. Let _parenCount_ be the number of left-capturing parentheses in |Atom|. This is the total number of <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> Parse Nodes enclosed by |Atom|.
1. Return a new Matcher with parameters (_x_, _c_) that captures _m_, _q_, _parenIndex_, and _parenCount_ and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Return RepeatMatcher(_m_, _q_.[[Min]], _q_.[[Max]], _q_.[[Greedy]], _x_, _c_, _parenIndex_, _parenCount_).
</emu-alg>
</emu-clause>
<emu-clause id="sec-compileassertion" type="sdo" oldids="sec-assertion">
<h1>
Runtime Semantics: CompileAssertion (
<ins>_modifiers_: a Modifiers Record,</ins>
): a Matcher
</h1>
<dl class="header">
</dl>
<emu-note>
<p>This section is amended in B.1.2.5.</p>
</emu-note>
<emu-grammar>Assertion :: `^`</emu-grammar>
<emu-alg>
1. Return a new Matcher with parameters (_x_, _c_) that captures nothing and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _e_ be _x_'s _endIndex_.
1. If _e_ = 0, or if <del>_Multiline_</del><ins>_modifiers_.[[Multiline]]</ins> is *true* and the character _Input_[_e_ - 1] is one of |LineTerminator|, then
1. Return _c_(_x_).
1. Return ~failure~.
</emu-alg>
<emu-note>
<p>Even when the `y` flag is used with a pattern, `^` always matches only at the beginning of _Input_, or (if <del>_Multiline_</del><ins>_modifiers_.[[Multiline]]</ins> is *true*) at the beginning of a line.</p>
</emu-note>
<emu-grammar>Assertion :: `$`</emu-grammar>
<emu-alg>
1. Return a new Matcher with parameters (_x_, _c_) that captures nothing and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _e_ be _x_'s _endIndex_.
1. If _e_ = _InputLength_, or if <del>_Multiline_</del><ins>_modifiers_.[[Multiline]]</ins> is *true* and the character _Input_[_e_] is one of |LineTerminator|, then
1. Return _c_(_x_).
1. Return ~failure~.
</emu-alg>
<emu-grammar>Assertion :: `\` `b`</emu-grammar>
<emu-alg>
1. Return a new Matcher with parameters (_x_, _c_) that captures nothing and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _e_ be _x_'s _endIndex_.
1. Let _a_ be IsWordChar(_e_ - 1<ins>, _modifiers_</ins>).
1. Let _b_ be IsWordChar(_e_<ins>, _modifiers_</ins>).
1. If _a_ is *true* and _b_ is *false*, or if _a_ is *false* and _b_ is *true*, return _c_(_x_).
1. Return ~failure~.
</emu-alg>
<emu-grammar>Assertion :: `\` `B`</emu-grammar>
<emu-alg>
1. Return a new Matcher with parameters (_x_, _c_) that captures nothing and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _e_ be _x_'s _endIndex_.
1. Let _a_ be IsWordChar(_e_ - 1<ins>, _modifiers_</ins>).
1. Let _b_ be IsWordChar(_e_<ins>, _modifiers_</ins>).
1. If _a_ is *true* and _b_ is *true*, or if _a_ is *false* and _b_ is *false*, return _c_(_x_).
1. Return ~failure~.
</emu-alg>
<emu-grammar>Assertion :: `(` `?` `=` Disjunction `)`</emu-grammar>
<emu-alg>
1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> ~forward~<ins> and _modifiers_</ins>.
1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _d_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called:
1. Assert: _y_ is a State.
1. Return _y_.
1. Let _r_ be _m_(_x_, _d_).
1. If _r_ is ~failure~, return ~failure~.
1. Let _y_ be _r_'s State.
1. Let _cap_ be _y_'s _captures_ List.
1. Let _xe_ be _x_'s _endIndex_.
1. Let _z_ be the State (_xe_, _cap_).
1. Return _c_(_z_).
</emu-alg>
<emu-grammar>Assertion :: `(` `?` `!` Disjunction `)`</emu-grammar>
<emu-alg>
1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> ~forward~<ins> and _modifiers_</ins>.
1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _d_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called:
1. Assert: _y_ is a State.
1. Return _y_.
1. Let _r_ be _m_(_x_, _d_).
1. If _r_ is not ~failure~, return ~failure~.
1. Return _c_(_x_).
</emu-alg>
<emu-grammar>Assertion :: `(` `?` `<=` Disjunction `)`</emu-grammar>
<emu-alg>
1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> ~backward~<ins> and _modifiers_</ins>.
1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _d_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called:
1. Assert: _y_ is a State.
1. Return _y_.
1. Let _r_ be _m_(_x_, _d_).
1. If _r_ is ~failure~, return ~failure~.
1. Let _y_ be _r_'s State.
1. Let _cap_ be _y_'s _captures_ List.
1. Let _xe_ be _x_'s _endIndex_.
1. Let _z_ be the State (_xe_, _cap_).
1. Return _c_(_z_).
</emu-alg>
<emu-grammar>Assertion :: `(` `?` `<!` Disjunction `)`</emu-grammar>
<emu-alg>
1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> ~backward~<ins> and _modifiers_</ins>.
1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _d_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called:
1. Assert: _y_ is a State.
1. Return _y_.
1. Let _r_ be _m_(_x_, _d_).
1. If _r_ is not ~failure~, return ~failure~.
1. Return _c_(_x_).
</emu-alg>
<emu-clause id="sec-runtime-semantics-iswordchar-abstract-operation" type="abstract operation">
<h1>
IsWordChar (
_e_: an integer,
<ins>_modifiers_: a Modifiers Record,</ins>
)
</h1>
<dl class="header">
</dl>
<emu-alg>
1. If _e_ = -1 or _e_ is _InputLength_, return *false*.
1. Let _c_ be the character _Input_[_e_].
1. <ins>Let _wordCharacters_ be GetWordCharacters(_modifiers_).</ins>
1. If _c_ is in <del>_WordCharacters_</del><ins>_wordCharacters_</ins>, return *true*.
1. Return *false*.
</emu-alg>
</emu-clause>
</emu-clause>
<emu-clause id="sec-compileatom" type="sdo" oldids="sec-atom,sec-atomescape,sec-characterescape,sec-decimalescape">
<h1>
Runtime Semantics: CompileAtom (
_direction_: ~forward~ or ~backward~,
<ins>_modifiers_: a Modifiers Record,</ins>
): a Matcher
</h1>
<dl class="header">
</dl>
<emu-note>
<p>This section is amended in B.1.2.6.</p>
</emu-note>
<!-- Atom -->
<emu-grammar>Atom :: PatternCharacter</emu-grammar>
<emu-alg>
1. Let _ch_ be the character matched by |PatternCharacter|.
1. Let _A_ be a one-element CharSet containing the character _ch_.
1. Return CharacterSetMatcher(_A_, *false*, _direction_<ins>, _modifiers_</ins>).
</emu-alg>
<emu-grammar>Atom :: `.`</emu-grammar>
<emu-alg>
1. Let _A_ be the CharSet of all characters.
1. If <del>_DotAll_</del><ins>_modifiers_.[[DotAll]]</ins> is not *true*, then
1. Remove from _A_ all characters corresponding to a code point on the right-hand side of the |LineTerminator| production.
1. Return CharacterSetMatcher(_A_, *false*, _direction_<ins>, _modifiers_</ins>).
</emu-alg>
<emu-grammar>Atom :: CharacterClass</emu-grammar>
<emu-alg>
1. Let _cc_ be CompileCharacterClass of |CharacterClass|.
1. Return CharacterSetMatcher(_cc_.[[CharSet]], _cc_.[[Invert]], _direction_<ins>, _modifiers_</ins>).
</emu-alg>
<emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar>
<emu-alg>
1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>.
1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of this |Atom|. This is the total number of <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> Parse Nodes prior to or enclosing this |Atom|.
1. Return a new Matcher with parameters (_x_, _c_) that captures _direction_, _m_, and _parenIndex_ and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _d_ be a new Continuation with parameters (_y_) that captures _x_, _c_, _direction_, and _parenIndex_ and performs the following steps when called:
1. Assert: _y_ is a State.
1. Let _cap_ be a copy of _y_'s _captures_ List.
1. Let _xe_ be _x_'s _endIndex_.
1. Let _ye_ be _y_'s _endIndex_.
1. If _direction_ is ~forward~, then
1. Assert: _xe_ ≤ _ye_.
1. Let _s_ be a List whose elements are the characters of _Input_ at indices _xe_ (inclusive) through _ye_ (exclusive).
1. Else,
1. Assert: _direction_ is ~backward~.
1. Assert: _ye_ ≤ _xe_.
1. Let _s_ be a List whose elements are the characters of _Input_ at indices _ye_ (inclusive) through _xe_ (exclusive).
1. Set _cap_[_parenIndex_ + 1] to _s_.
1. Let _z_ be the State (_ye_, _cap_).
1. Return _c_(_z_).
1. Return _m_(_x_, _d_).
</emu-alg>
<del class="block">
<emu-grammar>Atom :: `(` `?` `:` Disjunction `)`</emu-grammar>
<emu-alg>
1. Return CompileSubpattern of |Disjunction| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>.
</emu-alg>
</del>
<ins class="block">
<emu-grammar>Atom :: `(` `?` RegularExpressionFlags `:` Disjunction `)`</emu-grammar>
<emu-alg>
1. Let _addModifiers_ be the source text matched by |RegularExpressionFlags|.
1. Let _removeModifiers_ be the empty String.
1. Let _newModifiers_ be UpdateModifiers(_modifiers_, CodePointsToString(_addModifiers_), _removeModifiers_).
1. Return CompileSubpattern of |Disjunction| with arguments _direction_ and _newModifiers_.
</emu-alg>
<emu-grammar>Atom :: `(` `?` RegularExpressionFlags `-` RegularExpressionFlags `:` Disjunction `)`</emu-grammar>
<emu-alg>
1. Let _addModifiers_ be the source text matched by the first |RegularExpressionFlags|.
1. Let _removeModifiers_ be the source text matched by the second |RegularExpressionFlags|.
1. Let _newModifiers_ be UpdateModifiers(_modifiers_, CodePointsToString(_addModifiers_), CodePointsToString(_removeModifiers_)).
1. Return CompileSubpattern of |Disjunction| with arguments _direction_ and _newModifiers_.
</emu-alg>
</ins>
<!-- AtomEscape -->
<emu-grammar>AtomEscape :: DecimalEscape</emu-grammar>
<emu-alg>
1. Let _n_ be the CapturingGroupNumber of |DecimalEscape|.
1. Assert: _n_ ≤ _NcapturingParens_.
1. Return BackreferenceMatcher(_n_, _direction_<ins>, _modifiers_</ins>).
</emu-alg>
<emu-note>
<p>An escape sequence of the form `\\` followed by a non-zero decimal number _n_ matches the result of the _n_<sup>th</sup> set of capturing parentheses (<emu-xref href="#sec-notation"></emu-xref>). It is an error if the regular expression has fewer than _n_ capturing parentheses. If the regular expression has _n_ or more capturing parentheses but the _n_<sup>th</sup> one is *undefined* because it has not captured anything, then the backreference always succeeds.</p>
</emu-note>
<emu-grammar>AtomEscape :: CharacterEscape</emu-grammar>
<emu-alg>
1. Let _cv_ be the CharacterValue of |CharacterEscape|.
1. Let _ch_ be the character whose character value is _cv_.
1. Let _A_ be a one-element CharSet containing the character _ch_.
1. Return CharacterSetMatcher(_A_, *false*, _direction_<ins>, _modifiers_</ins>).
</emu-alg>
<emu-grammar>AtomEscape :: CharacterClassEscape</emu-grammar>
<emu-alg>
1. Let _A_ be CompileToCharSet of |CharacterClassEscape|.
1. Return CharacterSetMatcher(_A_, *false*, _direction_<ins>, _modifiers_</ins>).
</emu-alg>
<emu-grammar>AtomEscape :: `k` GroupName</emu-grammar>
<emu-alg>
1. Search the enclosing |Pattern| for an instance of a |GroupSpecifier| containing a |RegExpIdentifierName| which has a CapturingGroupName equal to the CapturingGroupName of the |RegExpIdentifierName| contained in |GroupName|.
1. Assert: A unique such |GroupSpecifier| is found.
1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of the located |GroupSpecifier|. This is the total number of <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> Parse Nodes prior to or enclosing the located |GroupSpecifier|, including its immediately enclosing |Atom|.
1. Return BackreferenceMatcher(_parenIndex_, _direction_<ins>, _modifiers_</ins>).
</emu-alg>
<emu-clause id="sec-runtime-semantics-charactersetmatcher-abstract-operation" type="abstract operation">
<h1>
CharacterSetMatcher (
_A_: a CharSet,
_invert_: a Boolean,
_direction_: ~forward~ or ~backward~,
<ins>_modifiers_: a Modifiers Record,</ins>
): a Matcher
</h1>
<dl class="header">
</dl>
<emu-alg>
1. Return a new Matcher with parameters (_x_, _c_) that captures _A_, _invert_, and _direction_ and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _e_ be _x_'s _endIndex_.
1. If _direction_ is ~forward~, let _f_ be _e_ + 1.
1. Else, let _f_ be _e_ - 1.
1. If _f_ < 0 or _f_ > _InputLength_, return ~failure~.
1. Let _index_ be min(_e_, _f_).
1. Let _ch_ be the character _Input_[_index_].
1. Let _cc_ be Canonicalize(_ch_<ins>, _modifiers_</ins>).
1. If there exists a member _a_ of _A_ such that Canonicalize(_a_, <ins>_modifiers_</ins>) is _cc_, let _found_ be *true*. Otherwise, let _found_ be *false*.
1. If _invert_ is *false* and _found_ is *false*, return ~failure~.
1. If _invert_ is *true* and _found_ is *true*, return ~failure~.
1. Let _cap_ be _x_'s _captures_ List.
1. Let _y_ be the State (_f_, _cap_).
1. Return _c_(_y_).
</emu-alg>
</emu-clause>
<emu-clause id="sec-backreference-matcher" type="abstract operation">
<h1>
BackreferenceMatcher (
_n_: a positive integer,
_direction_: ~forward~ or ~backward~,
<ins>_modifiers_: a Modifiers Record,</ins>
): a Matcher
</h1>
<dl class="header">
</dl>
<emu-alg>
1. Assert: _n_ ≥ 1.
1. Return a new Matcher with parameters (_x_, _c_) that captures _n_ and _direction_ and performs the following steps when called:
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _cap_ be _x_'s _captures_ List.
1. Let _s_ be _cap_[_n_].
1. If _s_ is *undefined*, return _c_(_x_).
1. Let _e_ be _x_'s _endIndex_.
1. Let _len_ be the number of elements in _s_.
1. If _direction_ is ~forward~, let _f_ be _e_ + _len_.
1. Else, let _f_ be _e_ - _len_.
1. If _f_ < 0 or _f_ > _InputLength_, return ~failure~.
1. Let _g_ be min(_e_, _f_).
1. If there exists an integer _i_ between 0 (inclusive) and _len_ (exclusive) such that Canonicalize(_s_[_i_]<ins>, _modifiers_</ins>) is not the same character value as Canonicalize(_Input_[_g_ + _i_], <ins>_modifiers_</ins>), return ~failure~.
1. Let _y_ be the State (_f_, _cap_).
1. Return _c_(_y_).
</emu-alg>
</emu-clause>
<emu-clause id="sec-runtime-semantics-canonicalize-ch" type="abstract operation">
<h1>
Canonicalize (
_ch_: a character,
<ins>_modifiers_: a Modifiers Record,</ins>
): a Matcher
</h1>
<dl class="header">
</dl>
<emu-alg>
1. If _Unicode_ is *true* and <del>_IgnoreCase_</del><ins>_modifiers_.[[IgnoreCase]]</ins> is *true*, then
1. If the file CaseFolding.txt of the Unicode Character Database provides a simple or common case folding mapping for _ch_, return the result of applying that mapping to _ch_.
1. Return _ch_.
1. If <del>_IgnoreCase_</del><ins>_modifiers_.[[IgnoreCase]]</ins> is *false*, return _ch_.
1. Assert: _ch_ is a UTF-16 code unit.
1. Let _cp_ be the code point whose numeric value is that of _ch_.
1. Let _u_ be the result of toUppercase(« _cp_ »), according to the Unicode Default Case Conversion algorithm.
1. Let _uStr_ be CodePointsToString(_u_).
1. If _uStr_ does not consist of a single code unit, return _ch_.
1. Let _cu_ be _uStr_'s single code unit element.
1. If the numeric value of _ch_ ≥ 128 and the numeric value of _cu_ < 128, return _ch_.
1. Return _cu_.
</emu-alg>
<emu-note>
<p>Parentheses of the form `(` |Disjunction| `)` serve both to group the components of the |Disjunction| pattern together and to save the result of the match. The result can be used either in a backreference (`\\` followed by a non-zero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching Abstract Closure. To inhibit the capturing behaviour of parentheses, use the form `(?:` |Disjunction| `)` instead.</p>
</emu-note>
<emu-note>
<p>The form `(?=` |Disjunction| `)` specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside |Disjunction| must match at the current position, but the current position is not advanced before matching the sequel. If |Disjunction| can match at the current position in several ways, only the first one is tried. Unlike other regular expression operators, there is no backtracking into a `(?=` form (this unusual behaviour is inherited from Perl). This only matters when the |Disjunction| contains capturing parentheses and the sequel of the pattern contains backreferences to those captures.</p>
<p>For example,</p>
<pre><code class="javascript">/(?=(a+))/.exec("baaabac")</code></pre>
<p>matches the empty String immediately after the first `b` and therefore returns the array:</p>
<pre><code class="javascript">["", "aaa"]</code></pre>
<p>To illustrate the lack of backtracking into the lookahead, consider:</p>
<pre><code class="javascript">/(?=(a+))a*b\1/.exec("baaabac")</code></pre>
<p>This expression returns</p>
<pre><code class="javascript">["aba", "a"]</code></pre>
<p>and not:</p>
<pre><code class="javascript">["aaaba", "a"]</code></pre>
</emu-note>
<emu-note>
<p>The form `(?!` |Disjunction| `)` specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside |Disjunction| must fail to match at the current position. The current position is not advanced before matching the sequel. |Disjunction| can contain capturing parentheses, but backreferences to them only make sense from within |Disjunction| itself. Backreferences to these capturing parentheses from elsewhere in the pattern always return *undefined* because the negative lookahead must fail for the pattern to succeed. For example,</p>
<pre><code class="javascript">/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")</code></pre>
<p>looks for an `a` not immediately followed by some positive number n of `a`'s, a `b`, another n `a`'s (specified by the first `\\2`) and a `c`. The second `\\2` is outside the negative lookahead, so it matches against *undefined* and therefore always succeeds. The whole expression returns the array:</p>
<pre><code class="javascript">["baaabaac", "ba", undefined, "abaac"]</code></pre>
</emu-note>
<emu-note>
<p>In case-insignificant matches when _Unicode_ is *true*, all characters are implicitly case-folded using the simple mapping provided by the Unicode standard immediately before they are compared. The simple mapping always maps to a single code point, so it does not map, for example, `ß` (U+00DF) to `SS`. It may however map a code point outside the Basic Latin range to a character within, for example, `ſ` (U+017F) to `s`. Such characters are not mapped if _Unicode_ is *false*. This prevents Unicode code points such as U+017F and U+212A from matching regular expressions such as `/[a-z]/i`, but they will match `/[a-z]/ui`.</p>
</emu-note>
</emu-clause>
</emu-clause>
<emu-clause id="sec-compiletocharset" type="sdo" oldids="sec-classranges,sec-nonemptyclassranges,sec-nonemptyclassrangesnodash,sec-classatom,sec-classatomnodash,sec-classescape,sec-characterclassescape">
<h1>Runtime Semantics: CompileToCharSet ( ): a CharSet</h1>
<dl class="header">
</dl>
<emu-note>
<p>This section is amended in <emu-xref href="#sec-compiletocharset-annexb"></emu-xref>.</p>
</emu-note>
<!-- ClassRanges -->
<emu-grammar>ClassRanges :: [empty]</emu-grammar>
<emu-alg>
1. Return the empty CharSet.
</emu-alg>
<!-- NonemptyClassRanges -->
<emu-grammar>NonemptyClassRanges :: ClassAtom NonemptyClassRangesNoDash</emu-grammar>
<emu-alg>
1. Let _A_ be CompileToCharSet of |ClassAtom|.
1. Let _B_ be CompileToCharSet of |NonemptyClassRangesNoDash|.
1. Return the union of CharSets _A_ and _B_.
</emu-alg>
<emu-grammar>NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges</emu-grammar>
<emu-alg>
1. Let _A_ be CompileToCharSet of the first |ClassAtom|.
1. Let _B_ be CompileToCharSet of the second |ClassAtom|.
1. Let _C_ be CompileToCharSet of |ClassRanges|.
1. Let _D_ be CharacterRange(_A_, _B_).
1. Return the union of _D_ and _C_.
</emu-alg>
<!-- NonemptyClassRangesNoDash -->
<emu-grammar>NonemptyClassRangesNoDash :: ClassAtomNoDash NonemptyClassRangesNoDash</emu-grammar>
<emu-alg>
1. Let _A_ be CompileToCharSet of |ClassAtomNoDash|.
1. Let _B_ be CompileToCharSet of |NonemptyClassRangesNoDash|.
1. Return the union of CharSets _A_ and _B_.
</emu-alg>
<emu-grammar>NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges</emu-grammar>
<emu-alg>
1. Let _A_ be CompileToCharSet of |ClassAtomNoDash|.
1. Let _B_ be CompileToCharSet of |ClassAtom|.
1. Let _C_ be CompileToCharSet of |ClassRanges|.
1. Let _D_ be CharacterRange(_A_, _B_).
1. Return the union of _D_ and _C_.
</emu-alg>
<emu-note>
<p>|ClassRanges| can expand into a single |ClassAtom| and/or ranges of two |ClassAtom| separated by dashes. In the latter case the |ClassRanges| includes all characters between the first |ClassAtom| and the second |ClassAtom|, inclusive; an error occurs if either |ClassAtom| does not represent a single character (for example, if one is \w) or if the first |ClassAtom|'s character value is greater than the second |ClassAtom|'s character value.</p>
</emu-note>
<emu-note>
<p>Even if the pattern ignores case, the case of the two ends of a range is significant in determining which characters belong to the range. Thus, for example, the pattern `/[E-F]/i` matches only the letters `E`, `F`, `e`, and `f`, while the pattern `/[E-f]/i` matches all upper and lower-case letters in the Unicode Basic Latin block as well as the symbols `[`, `\\`, `]`, `^`, `_`, and <code>`</code>.</p>
</emu-note>
<emu-note>
<p>A `-` character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of |ClassRanges|, the beginning or end limit of a range specification, or immediately follows a range specification.</p>
</emu-note>
<!-- ClassAtom -->
<emu-grammar>ClassAtom :: `-`</emu-grammar>
<emu-alg>
1. Return the CharSet containing the single character `-` U+002D (HYPHEN-MINUS).
</emu-alg>
<!-- ClassAtomNoDash -->
<emu-grammar>ClassAtomNoDash :: SourceCharacter but not one of `\` or `]` or `-`</emu-grammar>
<emu-alg>
1. Return the CharSet containing the character matched by |SourceCharacter|.
</emu-alg>
<!-- ClassEscape -->
<emu-grammar>
ClassEscape :: `b`
ClassEscape :: `-`
ClassEscape :: CharacterEscape
</emu-grammar>
<emu-alg>
1. Let _cv_ be the CharacterValue of this |ClassEscape|.
1. Let _c_ be the character whose character value is _cv_.
1. Return the CharSet containing the single character _c_.
</emu-alg>
<emu-note>
<p>A |ClassAtom| can use any of the escape sequences that are allowed in the rest of the regular expression except for `\\b`, `\\B`, and backreferences. Inside a |CharacterClass|, `\\b` means the backspace character, while `\\B` and backreferences raise errors. Using a backreference inside a |ClassAtom| causes an error.</p>
</emu-note>
<!-- CharacterClassEscape -->
<emu-grammar>CharacterClassEscape :: `d`</emu-grammar>
<emu-alg>
1. Return the ten-element CharSet containing the characters `0` through `9` inclusive.
</emu-alg>
<emu-grammar>CharacterClassEscape :: `D`</emu-grammar>
<emu-alg>
1. Return the CharSet containing all characters not in the CharSet returned by <emu-grammar>CharacterClassEscape :: `d`</emu-grammar> .
</emu-alg>
<emu-grammar>CharacterClassEscape :: `s`</emu-grammar>
<emu-alg>
1. Return the CharSet containing all characters corresponding to a code point on the right-hand side of the |WhiteSpace| or |LineTerminator| productions.
</emu-alg>
<emu-grammar>CharacterClassEscape :: `S`</emu-grammar>
<emu-alg>
1. Return the CharSet containing all characters not in the CharSet returned by <emu-grammar>CharacterClassEscape :: `s`</emu-grammar> .
</emu-alg>
<emu-grammar>CharacterClassEscape :: `w`</emu-grammar>
<emu-alg>
1. Return <del>_WordCharacters_</del><ins>GetWordCharacters(_modifiers_)</ins>.
</emu-alg>
<emu-grammar>CharacterClassEscape :: `W`</emu-grammar>
<emu-alg>
1. Return the CharSet containing all characters not in the CharSet returned by <emu-grammar>CharacterClassEscape :: `w`</emu-grammar> .
</emu-alg>
<emu-grammar>CharacterClassEscape :: `p{` UnicodePropertyValueExpression `}`</emu-grammar>
<emu-alg>
1. Return the CharSet containing all Unicode code points included in CompileToCharSet of |UnicodePropertyValueExpression|.
</emu-alg>
<emu-grammar>CharacterClassEscape :: `P{` UnicodePropertyValueExpression `}`</emu-grammar>
<emu-alg>
1. Return the CharSet containing all Unicode code points not included in CompileToCharSet of |UnicodePropertyValueExpression|.
</emu-alg>
<emu-grammar>UnicodePropertyValueExpression :: UnicodePropertyName `=` UnicodePropertyValue</emu-grammar>
<emu-alg>
1. Let _ps_ be SourceText of |UnicodePropertyName|.
1. Let _p_ be UnicodeMatchProperty(_ps_).
1. Assert: _p_ is a Unicode property name or property alias listed in the “Property name and aliases” column of <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref>.
1. Let _vs_ be SourceText of |UnicodePropertyValue|.
1. Let _v_ be UnicodeMatchPropertyValue(_p_, _vs_).
1. Return the CharSet containing all Unicode code points whose character database definition includes the property _p_ with value _v_.
</emu-alg>
<emu-grammar>UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue</emu-grammar>
<emu-alg>
1. Let _s_ be SourceText of |LoneUnicodePropertyNameOrValue|.
1. If UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a List of Unicode code points that is the name of a Unicode general category or general category alias listed in the “Property value and aliases” column of <emu-xref href="#table-unicode-general-category-values"></emu-xref>, then
1. Return the CharSet containing all Unicode code points whose character database definition includes the property “General_Category” with value _s_.
1. Let _p_ be UnicodeMatchProperty(_s_).
1. Assert: _p_ is a binary Unicode property or binary property alias listed in the “Property name and aliases” column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>.
1. Return the CharSet containing all Unicode code points whose character database definition includes the property _p_ with value “True”.
</emu-alg>
</emu-clause>
<ins class="block">
<emu-clause id="sec-getwordcharacters" type="abstract operation">
<h1>
<ins>
GetWordCharacters (
_modifiers_: a Modifiers Record,
): a CharSet
</ins>
</h1>
<dl class="header">
</dl>
<emu-alg>
1. Let _wordCharacters_ be the mathematical set that is the union of all sixty-three characters in *"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_"* (letters, numbers, and U+005F (LOW LINE) in the Unicode Basic Latin block) and all characters _c_ for which _c_ is not in that set but Canonicalize(_c_, _modifiers_) is.
1. Return _wordCharacters_.
</emu-alg>
<emu-note>
_wordCharacters_ cannot contain more than sixty-three characters unless _Unicode_ and _modifiers_.[[IgnoreCase]] are both *true*.
</emu-note>
</emu-clause>
<emu-clause id="sec-updatemodifiers" type="abstract operation">
<h1>
<ins>
UpdateModifiers (
_modifiers_: a Modifiers Record,
_add_: a String,
_remove_: a String,
): a Modifiers
</ins>
</h1>
<dl class="header">
</dl>
<emu-alg>
1. Let _dotAll_ be _modifiers_.[[DotAll]].
1. Let _ignoreCase_ be _modifiers_.[[IgnoreCase]].
1. Let _multiline_ be _modifiers_.[[Multiline]].
1. If _add_ contains *"s"*, set _dotAll_ to *true*.
1. If _add_ contains *"i"*, set _ignoreCase_ to *true*.
1. If _add_ contains *"m"*, set _multiline_ to *true*.
1. If _remove_ contains *"s"*, set _dotAll_ to *false*.
1. If _remove_ contains *"i"*, set _ignoreCase_ to *false*.
1. If _remove_ contains *"m"*, set _multiline_ to *false*.
1. Return the Modifiers Record { [[DotAll]]: _dotAll_, [[IgnoreCase]]: _ignoreCase_, [[Multiline]]: _multiline_ }.
</emu-alg>
</emu-clause>