From d88512fdb16caf13262a66da43786285d9744bb8 Mon Sep 17 00:00:00 2001 From: David Bokan Date: Wed, 13 Dec 2023 16:14:28 -0500 Subject: [PATCH] [Spec] Overhaul directive parsing (#247) * Specify parsing imperatively This commit overhauls the parsing steps to avoid using the EBNF grammar for validity, instead specifying that imperatively. It also moves parsing to happen earlier in the process so that we pass around parsed Text Directive objects. Also makes the steps more precise, referring to infra types and correctly decoding the strings. Fixes #221 Fixes #230 * Fix and make grammar non-normative The grammar is now provided solely as a convenience so this makes the section non-normative. Also fixes it so that UnknownDirective doesn't subsume TextDirective. Fixes #220 --- index.bs | 271 +++++++++++++++++++-------------- index.html | 432 +++++++++++++++++++++++++++++++---------------------- 2 files changed, 407 insertions(+), 296 deletions(-) diff --git a/index.bs b/index.bs index 45336c4..3e06d4e 100644 --- a/index.bs +++ b/index.bs @@ -606,12 +606,10 @@ state=] to apply the directives associated with a session history entry to a [=/ > Monkeypatching [[DOM#interface-document]]: > -> Each document has an associated uninvoked directives which is either -> null or an ASCII string holding data used by the UA to process the resource. It is initially -> null. +> Each document has an associated pending text directives which is either +> null or an list of [=text directives=]. It is initially null. -In the definition of -update document for history step application: +In the definition of update document for history step application: > Monkeypatching [[HTML#updating-the-document]]: > @@ -621,19 +619,34 @@ update document for history step application: >
  • Set |document|'s history object's length to scriptHistoryLength
  • > 5. If documentsEntryChanged is true, then: > 1. Let oldURL be |document|'s latest entry's URL. -> 2. If |document|'s latest entry's [=she/directive state=] is not |entry|'s -> [=she/directive state=] then set |document|'s [=Document/uninvoked directives=] to |entry|'s -> [=she/directive state=]'s [=directive state/value=]. +> 2.
    If |document|'s latest entry's [=she/directive state=] is not +> |entry|'s [=she/directive state=] then: +> 1. Let |fragment directive| be |entry|'s [=she/directive state=]'s +> [=directive state/value=]. +> 1. Set |document|'s [=Document/pending text directives=] to the result of [=parse the +> fragment directive|parsing=] |fragment directive|. +>
    > 3. Set |document|'s latest entry to |entry| > 4. ... > -### Parsing the fragment directive ### {#parsing-the-fragment-directive} - ### Fragment directive grammar ### {#fragment-directive-grammar} -An ASCII string is a valid fragment directive if -it matches the production: +Note: This section is non-normative. + +Note: This grammar is provided as a convenient reference; however, the rules and steps for parsing +are specified imperatively in the [[#text-directives]] section. Where this grammar differs in +behavior from the steps of that section, the steps there are to be taken as the authoritative source +of truth. + +The [=FragmentDirective=] can contain multiple directives split by the "&" character. Currently this +means we allow multiple text directives to enable multiple indicated strings in the page, but this +also allows for future directive types to be added and combined. For extensibility, we do not fail +to parse if an unknown directive is in the &-separated list of directives. + +A string is a valid fragment directive if it matches the EBNF (Extended +Backus-Naur Form) production: +
    `FragmentDirective` `::=` @@ -641,17 +654,23 @@ it matches the production:
    ([=TextDirective=] | [=UnknownDirective=]) ("&" [=FragmentDirective=])?
    +
    + `TextDirective` `::=` +
    +
    + "text="[=CharacterString=] +
    `UnknownDirective` `::=`
    - [=CharacterString=] + [=CharacterString=] - [=TextDirective=]
    `CharacterString` `::=`
    - ([=ExplicitChar=] | [=PercentEncodedByte=])+ + ([=ExplicitChar=] | [=PercentEncodedByte=])*
    `ExplicitChar` `::=` @@ -665,16 +684,10 @@ it matches the production:
    -
    - The [=FragmentDirective=] can contain multiple directives split by the "&" - character. Currently this means we allow multiple text directives to enable - multiple indicated strings in the page, but this also allows for future - directive types to be added and combined. For extensibility, we do not fail to - parse if an unknown directive is in the &-separated list of directives. -
    +A [=TextDirective=] is considered valid if it matches the following production:
    -
    `TextDirective` `::=`
    +
    `ValidTextDirective` `::=`
    "text=" [=TextDirectiveParameters=]
    `TextDirectiveParameters` `::=`
    @@ -696,10 +709,10 @@ it matches the production: ";" | "=" | "?" | "@" | "_" | "~"
    - A [=TextDirectiveExplicitChar=] is any [=URL code point=] that is not - explicitly used in the [=TextDirective=] syntax, that is "&", "-", and ",". - If a text fragment refers to a "&", "-", or "," character in the document, - it will be percent-encoded in the fragment. + A [=TextDirectiveExplicitChar=] is any [=URL code point=] that is not explicitly used in the + [=FragmentDirective=] or [=ValidTextDirective=] syntax, that is "&", "-", and ",". If a text + fragment refers to a "&", "-", or "," character in the document, it will be percent-encoded in + the fragment.
    `PercentEncodedByte` `::=`
    @@ -721,77 +734,120 @@ of these items. See [[#syntax]] for the what each of these components means and how they're used. +
    + To percent-decode a text directive term given an input string |term|: + +
      + 1. If |term| is null, return null. + 1. Assert: |term| is an ASCII string. + 1. Let |decoded bytes| be the result of percent-decoding |term|. + 1. Return the result of running UTF-8 decode without BOM on |decoded + bytes|. +
    +
    +
    + To parse a text directive, on an string |text + directive value|, run these steps: + +
    +

    + This algorithm takes a single text directive value string as input (e.g. "prefix-,foo,bar") and + attempts to parse the string into the components of the directive (e.g. ("prefix", "foo", "bar", + null)). See [[#syntax]] for the what each of these components means and how they're used. +

    +

    + Returns null if the input is invalid. Otherwise, returns a [=text directive=]. +

    +
    -To parse a text directive, on an ASCII string |text -directive input|, run these steps: +
      + 1. Let |prefix|, |suffix|, |start|, |end|, each be null. + 1. Assert: |text directive value| is an ASCII string + with no code points in the fragment percent-encode set and no instances of + U+0026 (&). + 1. Let |tokens| be a list of strings that result from + strictly splitting |text directive value| on U+002C (,). + 1. If |tokens| has size less than 1 or greater than 4, return null. + 1. If the first item of |tokens| ends with U+002D (-): + 1. Set |prefix| to the substring of |tokens|[0] + from 0 with length |tokens|[0]'s length - 1. + 1. Remove the first item of |tokens|. + 1. If |prefix| is the empty string or contains any instances of U+002D (-), return null. + 1. If |tokens| is empty, return null. + 1. If the last item of |tokens| starts with U+002D (-): + 1. Set |suffix| to the substring of the last item of |tokens| from 1 to the end of the string. + 1. Remove the last item of |tokens|. + 1. If |suffix| is the empty string or contains any instances of U+002D (-), return null. + 1. If |tokens| is empty, return null. + 1. If |tokens| has size greater than 2, return null. + 1. Assert: |tokens| has size 1 or 2. + 1. Set |start| to the first item in |tokens|. + 1. Remove the first item in |tokens|. + 1. If |start| is the empty string or contains any instances of U+002D (-), return null. + 1. If |tokens| is not empty: + 1. Set |end| to the first item in |tokens|. + 1. If |end| is the empty string or contains any instances of U+002D (-), return null. + 1. Return a new [=text directive=], with +
      +
      [=text directive/prefix=]
      +
      The [=percent-decode a text directive term|percent-decoding=] of |prefix|
      +
      [=text directive/start=]
      +
      The [=percent-decode a text directive term|percent-decoding=] of |start|
      +
      [=text directive/end=]
      +
      The [=percent-decode a text directive term|percent-decoding=] of |end|
      +
      [=text directive/suffix=]
      +
      The [=percent-decode a text directive term|percent-decoding=] of |suffix|
      +
      +
    +
    + +
    + +To parse the fragment directive, an an ASCII string |fragment +directive|, run these steps:
    -

    - This algorithm takes a single text directive string as input (e.g. - "text=prefix-,foo,bar") and attempts to parse the string into the - components of the directive (e.g. ("prefix", "foo", "bar", null)). See - [[#syntax]] for the what each of these components means and how they're - used. -

    -

    - Returns null if the input is invalid or fails to parse in any way. - Otherwise, returns a [=text directive=]. -

    + This algorithm takes the fragment directive string (i.e. the part that follows ":~:") and returns + a list of [=text directive=] objects parsed from that string. Can return an empty list.
    -
      - 1. [=/Assert=]: |text directive input| matches the production [=TextDirective=]. - 1. Let |textDirectiveString| be the substring of |text directive - input| starting at index 5. -
      - This is the remainder of the |text directive input| following, - but not including, the "text=" prefix. -
      - 1. Let |tokens| be a list of strings that is the result of - splitting |textDirectiveString| on commas. - 1. If |tokens| has size less than 1 or greater than 4, return null. - 1. If any of |tokens|'s items are the empty string, return null. - 1. Let |retVal| be a [=text directive=] with each of its items initialized - to null. - 1. Let |potential prefix| be the first item of |tokens|. - 1. If the last character of |potential prefix| is U+002D (-), then: - 1. Set |retVal|'s [=text directive/prefix=] to the - [=string/percent-decode|percent-decoding=] of the result of removing the - last character from |potential prefix|. - 1. Remove the first item of the list |tokens|. - 1. Let |potential suffix| be the last item of |tokens|, if one exists, null - otherwise. - 1. If |potential suffix| is non-null and its first character is U+002D (-), - then: - 1. Set |retVal|'s [=text directive/suffix=] to the - [=string/percent-decode|percent-decoding=] of the result of removing the - first character from |potential suffix|. - 1. Remove the last item of the list |tokens|. - 1. If |tokens| has size not equal to 1 nor 2 then - return null. - 1. Set |retVal|'s [=text directive/start=] be the - [=string/percent-decode|percent-decoding=] of the first item of |tokens|. - 1. If |tokens| has size 2, then set |retVal|'s - [=text directive/end=] be the - [=string/percent-decode|percent-decoding=] of the last item of |tokens|. - 1. Return |retVal|. -
    +
      + 1. Let |directives| be the result of strictly + splitting |fragment directive| on U+0026 (&). + 1. Let |output| be an initially empty list of [=text directives=]. + 1. For each string |directive| in |directives|: + 1. If |directive| does not start with + "text=", then continue. + 1. Let |text directive value| be the code point substring from 5 to the end of |directive|. +
      Note: this may be the empty string.
      + 1. Let |parsed text directive| be the result of [=parse a text directive|parsing=] |text + directive value|. + 1. If |parsed text directive| is non-null, append it to + |output|. + 1. Return |output|. + +
    +
    ### Invoking Text Directives ### {#invoking-text-directives} -This section describes how text directives in a document's [=Document/uninvoked directives=] are +This section describes how text directives in a document's [=Document/pending text directives=] are processed and invoked to cause indication of the relevant text passages.
    The summarized changes in this section: - * Modify the indicated part processing model to try processing [=Document/uninvoked directives=] + * Modify the indicated part processing model to try processing [=Document/pending text directives=] into a [=range=] that will be returned as the indicated part. * Modify "scrolling to a fragment" to correctly scroll and set the Document's target element in the case of a [=range=] based indicated part. - * Ensure [=Document/uninvoked directives=] is reset to null when the user agent has finished the + * Ensure [=Document/pending text directives=] is reset to null when the user agent has finished the fragment search for the current navigation/traversal. * If the user agent finishes searching for a text directive, ensure it tries the regular fragment as a fallback. @@ -806,11 +862,11 @@ indicated part, enable a fragment to indicate a [=range=]. Make the followin > For an HTML document |document|, the following processing model must be followed to determine > its indicated part: > -> 1. Let |directives| be the document's [=Document/uninvoked directives=]. +> 1. Let |text directives| be the document's [=Document/pending text directives=]. > -> 1. If |directives| is non-null then: +> 1. If |text directives| is non-null then: > 1. Let |ranges| be a list that is the result of running -> the [=invoke text directives=] steps with |directives| and the document. +> the [=invoke text directives=] steps with |text directives| and the document. > 1. If |ranges| is non-empty, then: > 1. Let |firstRange| be the first item of |ranges|. > 1. Visually indicate each [=range=] in |ranges| in an @@ -885,7 +941,7 @@ prevent fragment scrolling if the force-load-at-top policy is enabled. Make the > >
    -The next two monkeypatches ensure the user agent clears [=Document/uninvoked directives=] when +The next two monkeypatches ensure the user agent clears [=Document/pending text directives=] when the fragment search is complete. In the case where a text directive search finishes because parsing has stopped, it tries one more search for a non-text directive fragment. @@ -906,17 +962,17 @@ try to scroll to the fragment: > abort these steps. >
  • If the user agent has reason to believe the user is no longer interested in scrolling to > the fragment, then: -> 1. Set [=Document/uninvoked directives=] to null. +> 1. Set [=Document/pending text directives=] to null. > 1. Abort these steps. > 1. If the document has no parser, or its parser has stopped parsing, > then:
  • -> 1. If [=Document/uninvoked directives=] is not null, then: -> 1. Set [=Document/uninvoked directives=] to null. +> 1. If [=Document/pending text directives=] is not null, then: +> 1. Set [=Document/pending text directives=] to null. > 1. Scroll to the fragment given |document|. > 1. Abort these steps. > 2. Scroll to the fragment given document. > 3. If document's indicated part is still null, then try to scroll to the fragment for -> document. Otherwise, set [=Document/uninvoked directives=] to +> document. Otherwise, set [=Document/pending text directives=] to > null. In the definition of @@ -930,7 +986,7 @@ navigate to a fragment: >
  • Update document for history step application given navigable's active > document, historyEntry, true, scriptHistoryIndex, and scriptHistoryLength.
  • > 9. Scroll to the fragment given navigable's active document. ->
  • Set |navigable|'s active document's [=Document/uninvoked directives=] to +>
  • Set |navigable|'s active document's [=Document/pending text directives=] to > null.
  • > 11. Let traversable be navigable's traversable navigable. > 12. ... @@ -1264,7 +1320,7 @@ Issue: Is this valid to say in the HTML spec? |user involvement|, follow these steps:
      - 1. If |document|'s [=Document/uninvoked directives=] field is null or empty, return false. + 1. If |document|'s [=Document/pending text directives=] field is null or empty, return false. 1. Let |is user involved| be true if: |document|'s [=document/text directive user activation=] is true, or |user involvement| is one of "activation" or "browser UI"; false otherwise. @@ -1645,35 +1701,20 @@ To find the shadow-including parent of |node| follow these steps:
      -To invoke text directives, given as input an ASCII string |text directives| and a [=/Document=] -|document|, run these steps: + To invoke text directives, given as input a list of [=text + directives=] |text directives| and a [=/Document=] |document|, run these steps: -
      - This algorithm takes as input a |text directives|, that is the - raw text of the fragment directive and the |document| over which it operates. - It returns a list of [=ranges=] that are to be visually - indicated, the first of which will be scrolled into view (if the UA scrolls - automatically). -
      +
      + This algorithm returns a list of [=ranges=] that are to be visually indicated, + the first of which will be scrolled into view (if the UA scrolls automatically). +
        - 1. If |text directives| is not a [=valid fragment directive=], then - return an empty list. - 2. Let |directives| be a list of ASCII strings - that is the result of [=strictly split a string|strictly splitting the - string=] |text directives| on "&". - 3. Let |ranges| be a list of [=ranges=], initially empty. - 4. For each ASCII string |directive| of |directives|: - 1. If |directive| does not match the production [=TextDirective=], - then [=iteration/continue=]. - 1. Let |parsedValues| be the result of running the [=parse a text - directive=] steps on |directive|. - 1. If |parsedValues| is null then [=iteration/continue=]. - 1. If the result of running [=find a range from a text directive=] given - |parsedValues| and |document| is non-null, then [=list/append=] it to - |ranges|. - 5. Return |ranges|. + 1. Let |ranges| be a list of [=ranges=], initially empty. + 1. For each [=text directive=] |directive| of |text directives|: + 1. If the result of running [=find a range from a text directive=] given |directive| and + |document| is non-null, then [=list/append=] it to |ranges|. + 1. Return |ranges|.
      diff --git a/index.html b/index.html index 5ed5559..8dd104b 100644 --- a/index.html +++ b/index.html @@ -881,8 +881,7 @@

      Table of Contents

      1. 3.3.1 Extracting the fragment directive
      2. 3.3.2 Applying directives to a document -
      3. 3.3.3 Parsing the fragment directive -
      4. 3.3.4 Fragment directive grammar +
      5. 3.3.3 Fragment directive grammar
    1. 3.4 Text Directives @@ -1440,11 +1439,10 @@

      Document.

      Monkeypatching DOM § 4.5 Interface Document:

      -

      Each document has an associated uninvoked directives which is either - null or an ASCII string holding data used by the UA to process the resource. It is initially - null.

      +

      Each document has an associated pending text directives which is either + null or an list of text directives. It is initially null.

      -

      In the definition of update document for history step application:

      +

      In the definition of update document for history step application:

      Monkeypatching HTML § 7.4.6.2 Updating the document:

      @@ -1459,39 +1457,53 @@

      If document’s latest entry’s directive state is not entry’s directive state then set document’s uninvoked directives to entry’s directive state's value.

      -
    2. -

      Set document’s latest entry to entry

      -
    3. -

      ...

      +
      + If document’s latest entry’s directive state is not entry’s directive state then: +
        +
      1. +

        Let fragment directive be entry’s directive state's value.

        +
      2. +

        Set document’s pending text directives to the result of parsing fragment directive.

        +
      +
    +
  • +

    Set document’s latest entry to entry

    +
  • +

    ...

    -

    3.3.3. Parsing the fragment directive

    -

    3.3.4. Fragment directive grammar

    -

    An ASCII string is a valid fragment directive if -it matches the production:

    +

    3.3.3. Fragment directive grammar

    +

    Note: This section is non-normative.

    +

    Note: This grammar is provided as a convenient reference; however, the rules and steps for parsing +are specified imperatively in the § 3.4 Text Directives section. Where this grammar differs in +behavior from the steps of that section, the steps there are to be taken as the authoritative source +of truth.

    +

    The FragmentDirective can contain multiple directives split by the "&" character. Currently this +means we allow multiple text directives to enable multiple indicated strings in the page, but this +also allows for future directive types to be added and combined. For extensibility, we do not fail +to parse if an unknown directive is in the &-separated list of directives.

    +

    A string is a valid fragment directive if it matches the EBNF (Extended +Backus-Naur Form) production:

    FragmentDirective ::= -
    (TextDirective | UnknownDirective) ("&" FragmentDirective)? +
    (TextDirective | UnknownDirective) ("&" FragmentDirective)? +
    TextDirective ::= +
    "text="CharacterString
    UnknownDirective ::= -
    CharacterString +
    CharacterString - TextDirective
    CharacterString ::= -
    (ExplicitChar | PercentEncodedByte)+ +
    (ExplicitChar | PercentEncodedByte)*
    ExplicitChar ::=
    [a-zA-Z0-9] | "!" | "$" | "'" | "(" | ")" | "*" | "+" | "." | "/" | ":" | ";" | "=" | "?" | "@" | "_" | "~" | "," | "-"
    An ExplicitChar may be any URL code point other than "&".
    -
    The FragmentDirective can contain multiple directives split by the "&" - character. Currently this means we allow multiple text directives to enable - multiple indicated strings in the page, but this also allows for future - directive types to be added and combined. For extensibility, we do not fail to - parse if an unknown directive is in the &-separated list of directives.
    +

    A TextDirective is considered valid if it matches the following production:

    -
    TextDirective ::= +
    ValidTextDirective ::=
    "text=" TextDirectiveParameters
    TextDirectiveParameters ::=
    (TextDirectivePrefix ",")? TextDirectiveString ("," TextDirectiveString)? ("," TextDirectiveSuffix)? @@ -1505,10 +1517,9 @@

    <
    [a-zA-Z0-9] | "!" | "$" | "'" | "(" | ")" | "*" | "+" | "." | "/" | ":" | ";" | "=" | "?" | "@" | "_" | "~" -
    A TextDirectiveExplicitChar is any URL code point that is not - explicitly used in the TextDirective syntax, that is "&", "-", and ",". - If a text fragment refers to a "&", "-", or "," character in the document, - it will be percent-encoded in the fragment.
    +
    A TextDirectiveExplicitChar is any URL code point that is not explicitly used in the FragmentDirective or ValidTextDirective syntax, that is "&", "-", and ",". If a text + fragment refers to a "&", "-", or "," character in the document, it will be percent-encoded in + the fragment.
    PercentEncodedByte ::=
    "%" [a-zA-Z0-9][a-zA-Z0-9]

    @@ -1520,82 +1531,138 @@

    § 3.2 Syntax for the what each of these components means and how they’re used.

    +
    + To percent-decode a text directive term given an input string term: +
      +
    1. +

      If term is null, return null.

      +
    2. +

      Assert: term is an ASCII string.

      +
    3. +

      Let decoded bytes be the result of percent-decoding term.

      +
    4. +

      Return the result of running UTF-8 decode without BOM on decoded +bytes.

      +
    +
    -

    To parse a text directive, on an ASCII string text -directive input, run these steps:

    + To parse a text directive, on an string text + directive value, run these steps:
    -

    This algorithm takes a single text directive string as input (e.g. - "text=prefix-,foo,bar") and attempts to parse the string into the - components of the directive (e.g. ("prefix", "foo", "bar", null)). See § 3.2 Syntax for the what each of these components means and how they’re - used.

    -

    Returns null if the input is invalid or fails to parse in any way. - Otherwise, returns a text directive.

    +

    This algorithm takes a single text directive value string as input (e.g. "prefix-,foo,bar") and + attempts to parse the string into the components of the directive (e.g. ("prefix", "foo", "bar", + null)). See § 3.2 Syntax for the what each of these components means and how they’re used.

    +

    Returns null if the input is invalid. Otherwise, returns a text directive.

    1. -

      Assert: text directive input matches the production TextDirective.

      -
    2. -

      Let textDirectiveString be the substring of text directive -input starting at index 5.

      -
      This is the remainder of the text directive input following, - but not including, the "text=" prefix.
      +

      Let prefix, suffix, start, end, each be null.

    3. -

      Let tokens be a list of strings that is the result of splitting textDirectiveString on commas.

      +

      Assert: text directive value is an ASCII string with no code points in the fragment percent-encode set and no instances of +U+0026 (&).

    4. -

      If tokens has size less than 1 or greater than 4, return null.

      +

      Let tokens be a list of strings that result from strictly splitting text directive value on U+002C (,).

    5. -

      If any of tokens’s items are the empty string, return null.

      +

      If tokens has size less than 1 or greater than 4, return null.

    6. -

      Let retVal be a text directive with each of its items initialized -to null.

      -
    7. -

      Let potential prefix be the first item of tokens.

      +

      If the first item of tokens ends with U+002D (-):

      +
        +
      1. +

        Set prefix to the substring of tokens[0] +from 0 with length tokens[0]'s length - 1.

        +
      2. +

        Remove the first item of tokens.

        +
      3. +

        If prefix is the empty string or contains any instances of U+002D (-), return null.

        +
      4. +

        If tokens is empty, return null.

        +
    8. -

      If the last character of potential prefix is U+002D (-), then:

      +

      If the last item of tokens starts with U+002D (-):

      1. -

        Set retVal’s prefix to the percent-decoding of the result of removing the -last character from potential prefix.

        +

        Set suffix to the substring of the last item of tokens from 1 to the end of the string.

        +
      2. +

        Remove the last item of tokens.

        +
      3. +

        If suffix is the empty string or contains any instances of U+002D (-), return null.

      4. -

        Remove the first item of the list tokens.

        +

        If tokens is empty, return null.

    9. -

      Let potential suffix be the last item of tokens, if one exists, null -otherwise.

      +

      If tokens has size greater than 2, return null.

      +
    10. +

      Assert: tokens has size 1 or 2.

      +
    11. +

      Set start to the first item in tokens.

      +
    12. +

      Remove the first item in tokens.

    13. -

      If potential suffix is non-null and its first character is U+002D (-), -then:

      +

      If start is the empty string or contains any instances of U+002D (-), return null.

      +
    14. +

      If tokens is not empty:

      1. -

        Set retVal’s suffix to the percent-decoding of the result of removing the -first character from potential suffix.

        +

        Set end to the first item in tokens.

      2. -

        Remove the last item of the list tokens.

        +

        If end is the empty string or contains any instances of U+002D (-), return null.

    15. -

      If tokens has size not equal to 1 nor 2 then -return null.

      +

      Return a new text directive, with

      +
      +
      prefix +
      The percent-decoding of prefix +
      start +
      The percent-decoding of start +
      end +
      The percent-decoding of end +
      suffix +
      The percent-decoding of suffix +
      +
    +
    +
    +

    To parse the fragment directive, an an ASCII string fragment +directive, run these steps:

    +
    This algorithm takes the fragment directive string (i.e. the part that follows ":~:") and returns + a list of text directive objects parsed from that string. Can return an empty list.
    +
    1. -

      Set retVal’s start be the percent-decoding of the first item of tokens.

      +

      Let directives be the result of strictly + splitting fragment directive on U+0026 (&).

    2. -

      If tokens has size 2, then set retVal’s end be the percent-decoding of the last item of tokens.

      +

      Let output be an initially empty list of text directives.

    3. -

      Return retVal.

      +

      For each string directive in directives:

      +
        +
      1. +

        If directive does not start with "text=", then continue.

        +
      2. +

        Let text directive value be the code point substring from 5 to the end of directive.

        +
        Note: this may be the empty string.
        +
      3. +

        Let parsed text directive be the result of parsing text + directive value.

        +
      4. +

        If parsed text directive is non-null, append it to output.

        +
      +
    4. +

      Return output.

    3.4.1. Invoking Text Directives

    -

    This section describes how text directives in a document’s uninvoked directives are +

    This section describes how text directives in a document’s pending text directives are processed and invoked to cause indication of the relevant text passages.

    The summarized changes in this section:
    • -

      Modify the indicated part processing model to try processing uninvoked directives into a range that will be returned as the indicated part.

      +

      Modify the indicated part processing model to try processing pending text directives into a range that will be returned as the indicated part.

    • Modify "scrolling to a fragment" to correctly scroll and set the Document’s target element in the case of a range based indicated part.

    • -

      Ensure uninvoked directives is reset to null when the user agent has finished the +

      Ensure pending text directives is reset to null when the user agent has finished the fragment search for the current navigation/traversal.

    • If the user agent finishes searching for a text directive, ensure it tries the regular @@ -1610,13 +1677,13 @@

    • -

      Let directives be the document’s uninvoked directives.

      +

      Let text directives be the document’s pending text directives.

    • -

      If directives is non-null then:

      +

      If text directives is non-null then:

      1. -

        Let ranges be a list that is the result of running - the invoke text directives steps with directives and the document.

        +

        Let ranges be a list that is the result of running + the invoke text directives steps with text directives and the document.

      2. If ranges is non-empty, then:

          @@ -1702,7 +1769,7 @@

          scroll a target into view, with target set to scrollTarget, behavior set to "auto", block set to blockPosition, and inline set to "nearest".

          Implementations MAY avoid scrolling to the target if it is - produced from a text directive.

          + produced from a text directive.

        1. Run the focusing steps for target, with the Document’s viewport as the fallback target.

          @@ -1714,7 +1781,7 @@

    -

    The next two monkeypatches ensure the user agent clears uninvoked directives when +

    The next two monkeypatches ensure the user agent clears pending text directives when the fragment search is complete. In the case where a text directive search finishes because parsing has stopped, it tries one more search for a non-text directive fragment.

    In the definition of try to scroll to the fragment:

    @@ -1740,7 +1807,7 @@

  • -

    Set uninvoked directives to null.

    +

    Set pending text directives to null.

  • Abort these steps.

    @@ -1750,10 +1817,10 @@

    1. -

      If uninvoked directives is not null, then:

      +

      If pending text directives is not null, then:

      1. -

        Set uninvoked directives to null.

        +

        Set pending text directives to null.

      2. Scroll to the fragment given document.

      @@ -1764,7 +1831,7 @@

      Scroll to the fragment given document.

    2. If document’s indicated part is still null, then try to scroll to the fragment for - document. Otherwise, set uninvoked directives to + document. Otherwise, set pending text directives to null.

    @@ -1782,7 +1849,7 @@

    Scroll to the fragment given navigable’s active document.

    -
  • Set navigable’s active document’s uninvoked directives to +
  • Set navigable’s active document’s pending text directives to null.
  • Let traversable be navigable’s traversable navigable.

    @@ -1801,9 +1868,9 @@

    3.5. Security and Privacy

    3.5.1. Motivation

    This section is non-normative
    -

    Care must be taken when implementing text directive so that it +

    Care must be taken when implementing text directive so that it cannot be used to exfiltrate information across origins. Scripts can navigate a -page to a cross-origin URL with a text directive. If a malicious +page to a cross-origin URL with a text directive. If a malicious actor can determine that the text fragment was successfully found in victim page as a result of such a navigation, they can infer the existence of any text on the page.

    @@ -1880,7 +1947,7 @@

    text directive-invoking URL, they would be able to determine +to a text directive-invoking URL, they would be able to determine the existence of a text snippet by measuring how long the navigation call takes.

    The restrictions in § 3.5.4 Restricting the Text Fragment prevent this specific case; in particular, the no-same-document-navigation restriction. @@ -2082,7 +2149,7 @@

    check if a text directive can be scrolled; given a Document document, an origin-or-null initiator origin, and user navigation involvement-or-null user involvement, follow these steps:
    1. -

      If document’s uninvoked directives field is null or empty, return false.

      +

      If document’s pending text directives field is null or empty, return false.

    2. Let is user involved be true if: document’s text directive user activation is true, or user involvement is one of "activation" or "browser @@ -2181,7 +2248,7 @@

    -

    Amend the update document for history step application steps +

    Amend the update document for history step application steps to check the force-load-at-top policy and avoid scrolling in a new document if it’s set.

    @@ -2458,7 +2525,7 @@

    <

    -
    The text fragment specification proposes an amendment to HTML § 7.4.2.3.3 Fragment navigations. In summary, if a text directive is +
    The text fragment specification proposes an amendment to HTML § 7.4.2.3.3 Fragment navigations. In summary, if a text directive is present and a match is found in the page, the text fragment takes precedent over the element fragment as the indicated part. We amend the HTML Document’s indicated part processing model to return a range, rather than an element, that will be scrolled into view.
    @@ -2501,49 +2568,32 @@

    If a directive successfully matches to text in the document, it returns a range indicating that match in the document. The invoke text directives steps are the high level API provided by this - section. These return a list of ranges that were matched + section. These return a list of ranges that were matched by the individual directive matching steps, in the order the directives were specified in the fragment directive string.

    If a directive was not matched, it does not add an item to the returned list.

    - To invoke text directives, given as input an ASCII string text directives and a Document document, run these steps: -
    This algorithm takes as input a text directives, that is the - raw text of the fragment directive and the document over which it operates. - It returns a list of ranges that are to be visually - indicated, the first of which will be scrolled into view (if the UA scrolls - automatically).
    + To invoke text directives, given as input a list of text + directives text directives and a Document document, run these steps: +
    This algorithm returns a list of ranges that are to be visually indicated, + the first of which will be scrolled into view (if the UA scrolls automatically).
    1. -

      If text directives is not a valid fragment directive, then -return an empty list.

      -
    2. -

      Let directives be a list of ASCII strings -that is the result of strictly splitting the -string text directives on "&".

      +

      Let ranges be a list of ranges, initially empty.

    3. -

      Let ranges be a list of ranges, initially empty.

      -
    4. -

      For each ASCII string directive of directives:

      +

      For each text directive directive of text directives:

      1. -

        If directive does not match the production TextDirective, -then continue.

        -
      2. -

        Let parsedValues be the result of running the parse a text -directive steps on directive.

        -
      3. -

        If parsedValues is null then continue.

        -
      4. -

        If the result of running find a range from a text directive given parsedValues and document is non-null, then append it to ranges.

        +

        If the result of running find a range from a text directive given directive and document is non-null, then append it to ranges.

    5. Return ranges.

    - To find a range from a text directive, given a text directive parsedValues and Document document, run the + To find a range from a text directive, given a text directive parsedValues and Document document, run the following steps:
    This algorithm takes as input a successfully parsed text directive and a @@ -2607,7 +2657,7 @@

    This can happen if prefixMatch’s end or its subsequent non-whitespace position is at the end of the document.

  • -

    Assert: matchRange’s start node is a Text node.

    +

    Assert: matchRange’s start node is a Text node.

    matchRange’s start now points to the next non-whitespace text data following a matched prefix.
  • @@ -2619,7 +2669,7 @@

    If potentialMatch is null, return null.

  • -

    If potentialMatch’s start is not matchRange’s start, then continue.

    +

    If potentialMatch’s start is not matchRange’s start, then continue.

    In this case, we found a prefix but it was followed by something other than a matching text so we’ll continue searching for the next instance of prefix.
    @@ -2658,7 +2708,7 @@

    Set potentialMatch’s end to endMatch’s end.

  • -

    Assert: potentialMatch is non-null, not collapsed and +

    Assert: potentialMatch is non-null, not collapsed and represents a range exactly containing an instance of matching text.

  • If parsedValues’s suffix is null, return potentialMatch.

    @@ -2695,7 +2745,7 @@

    If rangeEndSearchRange is collapsed then:

    1. -

      Assert: parsedValues’s end item is non-null

      +

      Assert: parsedValues’s end item is non-null

    2. Return null

      This can only happen for range matches due to the break for exact matches in step 9 of the @@ -2734,7 +2784,7 @@

      Set range’s start offset to 0.

    3. -

      Continue.

      +

      Continue.

  • If the substring data of node at offset offset and count 6 is equal to the string "&nbsp;" then:

    @@ -2762,7 +2812,7 @@

  • - To find a string in range given a string query, a range searchRange, and booleans wordStartBounded and wordEndBounded, + To find a string in range given a string query, a range searchRange, and booleans wordStartBounded and wordEndBounded, run these steps:
    This algorithm will return a range that represents the first instance of the query text that is fully contained within searchRange, optionally @@ -2797,7 +2847,7 @@

    Set searchRange’s start offset to 0.

  • -

    Continue.

    +

    Continue.

  • If curNode is not a visible text node:

    @@ -2807,12 +2857,12 @@

    Set searchRange’s start offset to 0.

  • -

    Continue.

    +

    Continue.

  • Let blockAncestor be the nearest block ancestor of curNode.

  • -

    Let textNodeList be a list of Text nodes, +

    Let textNodeList be a list of Text nodes, initially empty.

  • While curNode is a shadow-including descendant of blockAncestor and the position of the boundary point (curNode, 0) is not after searchRange’s end:

    @@ -2826,7 +2876,7 @@

    Set curNode to the next node, in shadow-including tree order, that isn’t a shadow-including descendant of curNode.

  • -

    Continue.

    +

    Continue.

  • If curNode is a visible text node then append it to textNodeList.

    @@ -2838,7 +2888,7 @@

    If curNode is null, then break.

  • -

    Assert: curNode follows searchRange’s start node.

    +

    Assert: curNode follows searchRange’s start node.

  • Set searchRange’s start to the boundary point (curNode, 0).

    @@ -2882,7 +2932,7 @@

    To find a range from a node list given a search string queryString, -a range searchRange, a list of Text nodes nodes, and booleans wordStartBounded and wordEndBounded, follow these steps: +a range searchRange, a list of Text nodes nodes, and booleans wordStartBounded and wordEndBounded, follow these steps:
    Optionally, this will only return a match if the matched text begins and/or ends on a word boundary. For example: @@ -2954,13 +3004,13 @@

    If matchIndex + queryString’s length is greater than searchBuffer’s length − endInset return null.

    If the match runs past the end of the search range, return null.
  • -

    Assert: start and end are non-null, valid boundary points in searchRange.

    +

    Assert: start and end are non-null, valid boundary points in searchRange.

  • Return a range with start start and end end.

  • - To get boundary point at index, given an integer index, list of Text nodes nodes, and a boolean isEnd, follow these steps: + To get boundary point at index, given an integer index, list of Text nodes nodes, and a boolean isEnd, follow these steps:

    This is a small helper routine used by the steps above to determine which node a given index in the concatenated string belongs to.

    @@ -3008,13 +3058,13 @@

    -

    A locale is a string containing a valid [BCP47] language tag, or the empty string. An empty string indicates that the primary +

    A locale is a string containing a valid [BCP47] language tag, or the empty string. An empty string indicates that the primary language is unknown.

    -

    A substring is word bounded in a string text, +

    A substring is word bounded in a string text, given locales startLocale and endLocale, if both the position of its first character is at a word boundary given startLocale, and the position after its last character is at a word boundary given endLocale.

    -

    A number position is at a word boundary in a string text, given a locale locale, if, using locale, either a word +

    A number position is at a word boundary in a string text, given a locale locale, if, using locale, either a word boundary immediately precedes the positionth code unit, or text’s length is more than 0 and position equals either 0 or text’s length.

    @@ -3158,7 +3208,7 @@

    4. Generating Text Fragment Directives

    This section is non-normative.

    This section contains recommendations for UAs automatically generating URLs -with a text directive. These recommendations aren’t normative but +with a text directive. These recommendations aren’t normative but are provided to ensure generated URLs result in maximally stable and usable URLs.

    4.1. Prefer Exact Matching To Range-based

    @@ -3193,18 +3243,18 @@

    TODO: Can we determine the above limit in some less arbitrary way?

    4.2. Use Context Only When Necessary

    -

    Context terms allow the text directive to disambiguate text +

    Context terms allow the text directive to disambiguate text snippets on a page. However, their use can make the URL more brittle in some cases. Often, the desired string will start or end at an element boundary. The context will therefore exist in an adjacent element. Changes to the page -structure could invalidate the text directive since the context and +structure could invalidate the text directive since the context and match text will no longer appear to be adjacent.

    Suppose we wish to craft a URL for the following text:
    <div class="section">HEADER</div>
     <div class="content">Text to quote</div>
     
    -

    We could craft the text directive as follows:

    +

    We could craft the text directive as follows:

    text=HEADER-,Text%20to%20quote
     

    However, suppose the page changes to add a "[edit]" link beside all section @@ -3219,11 +3269,11 @@

    TODO: Determine the numeric limit above in less arbitrary way.

    4.3. Determine If Fragment Id Is Needed

    -

    When the UA navigates to a URL containing a text directive, it will +

    When the UA navigates to a URL containing a text directive, it will fallback to scrolling into view a regular element-id based fragment if it exists and the text fragment isn’t found.

    This can be useful to provide a fallback, in case the text in the document -changes, invalidating the text directive.

    +changes, invalidating the text directive.

    Suppose we wish to craft a URL to https://en.wikipedia.org/wiki/History_of_computing quoting the sentence: @@ -3293,7 +3343,7 @@

    Index

    Terms defined by this specification

  • end, in § 3.4 -
  • ExplicitChar, in § 3.3.4 +
  • ExplicitChar, in § 3.3.3
  • find a range from a node list, in § 3.6.1
  • find a range from a text directive, in § 3.6.1
  • find a string in range, in § 3.6.1 @@ -3313,7 +3363,7 @@

    (interface), in § 3.9 -
  • definition of, in § 3.3.4 +
  • definition of, in § 3.3.3
  • fragmentDirective, in § 3.9
  • fragment directive delimiter, in § 3.3 @@ -3326,7 +3376,10 @@

    next non-whitespace position, in § 3.6.1
  • non-searchable subtree, in § 3.6.1
  • parse a text directive, in § 3.4 -
  • PercentEncodedByte, in § 3.3.4 +
  • parse the fragment directive, in § 3.4 +
  • pending text directives, in § 3.3.2 +
  • percent-decode a text directive term, in § 3.4 +
  • PercentEncodedByte, in § 3.3.3
  • prefix, in § 3.4
  • remove the fragment directive, in § 3.3.1
  • search invisible, in § 3.6.1 @@ -3334,23 +3387,22 @@

    start, in § 3.4
  • suffix, in § 3.4
  • text directive, in § 3.4 -
  • TextDirective, in § 3.3.4 +
  • TextDirective, in § 3.3.3
  • text directive allowing MIME type, in § 3.5.4 -
  • TextDirectiveExplicitChar, in § 3.3.4 -
  • TextDirectiveParameters, in § 3.3.4 -
  • TextDirectivePrefix, in § 3.3.4 -
  • TextDirectiveString, in § 3.3.4 -
  • TextDirectiveSuffix, in § 3.3.4 +
  • TextDirectiveExplicitChar, in § 3.3.3 +
  • TextDirectiveParameters, in § 3.3.3 +
  • TextDirectivePrefix, in § 3.3.3 +
  • TextDirectiveString, in § 3.3.3 +
  • TextDirectiveSuffix, in § 3.3.3
  • text directive user activation -
  • uninvoked directives, in § 3.3.2 -
  • UnknownDirective, in § 3.3.4 +
  • UnknownDirective, in § 3.3.3
  • user involvement, in § 3.5.4 -
  • valid fragment directive, in § 3.3.4 +
  • ValidTextDirective, in § 3.3.3
  • value, in § 3.3.1
  • visible text node, in § 3.6.1
  • word boundary, in § 3.6.2 @@ -3423,6 +3475,11 @@

    substring data
  • url +
  • + [ENCODING] defines the following terms: +
      +
    • utf-8 decode without bom +
  • [FETCH] defines the following terms: