From d88512fdb16caf13262a66da43786285d9744bb8 Mon Sep 17 00:00:00 2001
From: David Bokan
+ This algorithm takes a single text directive value string as input (e.g. "prefix-,foo,bar") and
+ attempts to parse the string into the components of the directive (e.g. ("prefix", "foo", "bar",
+ null)). See [[#syntax]] for the what each of these components means and how they're used.
+
+ Returns null if the input is invalid. Otherwise, returns a [=text directive=].
+
- This algorithm takes a single text directive string as input (e.g.
- "text=prefix-,foo,bar") and attempts to parse the string into the
- components of the directive (e.g. ("prefix", "foo", "bar", null)). See
- [[#syntax]] for the what each of these components means and how they're
- used.
-
- Returns null if the input is invalid or fails to parse in any way.
- Otherwise, returns a [=text directive=].
-
-([=TextDirective=] | [=UnknownDirective=]) ("&" [=FragmentDirective=])?
"text="[=CharacterString=]
+ [=CharacterString=]
+ [=CharacterString=] - [=TextDirective=]
([=ExplicitChar=] | [=PercentEncodedByte=])+
+ ([=ExplicitChar=] | [=PercentEncodedByte=])*
-
"text=" [=TextDirectiveParameters=]
+ 1. If |term| is null, return null.
+ 1. Assert: |term| is an ASCII string.
+ 1. Let |decoded bytes| be the result of percent-decoding |term|.
+ 1. Return the result of running UTF-8 decode without BOM on |decoded
+ bytes|.
+
+
+ 1. Let |prefix|, |suffix|, |start|, |end|, each be null.
+ 1. Assert: |text directive value| is an ASCII string
+ with no code points in the fragment percent-encode set and no instances of
+ U+0026 (&).
+ 1. Let |tokens| be a list of strings that result from
+ strictly splitting |text directive value| on U+002C (,).
+ 1. If |tokens| has size less than 1 or greater than 4, return null.
+ 1. If the first item of |tokens| ends with U+002D (-):
+ 1. Set |prefix| to the substring of |tokens|[0]
+ from 0 with length |tokens|[0]'s length - 1.
+ 1. Remove the first item of |tokens|.
+ 1. If |prefix| is the empty string or contains any instances of U+002D (-), return null.
+ 1. If |tokens| is empty, return null.
+ 1. If the last item of |tokens| starts with U+002D (-):
+ 1. Set |suffix| to the substring of the last item of |tokens| from 1 to the end of the string.
+ 1. Remove the last item of |tokens|.
+ 1. If |suffix| is the empty string or contains any instances of U+002D (-), return null.
+ 1. If |tokens| is empty, return null.
+ 1. If |tokens| has size greater than 2, return null.
+ 1. Assert: |tokens| has size 1 or 2.
+ 1. Set |start| to the first item in |tokens|.
+ 1. Remove the first item in |tokens|.
+ 1. If |start| is the empty string or contains any instances of U+002D (-), return null.
+ 1. If |tokens| is not empty:
+ 1. Set |end| to the first item in |tokens|.
+ 1. If |end| is the empty string or contains any instances of U+002D (-), return null.
+ 1. Return a new [=text directive=], with
+
+
+
+
- 1. [=/Assert=]: |text directive input| matches the production [=TextDirective=].
- 1. Let |textDirectiveString| be the substring of |text directive
- input| starting at index 5.
-
+
+ 1. Let |directives| be the result of strictly
+ splitting |fragment directive| on U+0026 (&).
+ 1. Let |output| be an initially empty list of [=text directives=].
+ 1. For each string |directive| in |directives|:
+ 1. If |directive| does not start with
+ "
+
text=
", then continue.
+ 1. Let |text directive value| be the code point substring from 5 to the end of |directive|.
+
- 1. If |document|'s [=Document/uninvoked directives=] field is null or empty, return false.
+ 1. If |document|'s [=Document/pending text directives=] field is null or empty, return false.
1. Let |is user involved| be true if: |document|'s [=document/text directive user activation=] is
true, or |user involvement| is one of "
activation
" or "browser
UI
"; false otherwise.
@@ -1645,35 +1701,20 @@ To find the shadow-including parent of |node| follow these steps:
- 1. If |text directives| is not a [=valid fragment directive=], then
- return an empty list.
- 2. Let |directives| be a list of ASCII strings
- that is the result of [=strictly split a string|strictly splitting the
- string=] |text directives| on "&".
- 3. Let |ranges| be a list of [=ranges=], initially empty.
- 4. For each ASCII string |directive| of |directives|:
- 1. If |directive| does not match the production [=TextDirective=],
- then [=iteration/continue=].
- 1. Let |parsedValues| be the result of running the [=parse a text
- directive=] steps on |directive|.
- 1. If |parsedValues| is null then [=iteration/continue=].
- 1. If the result of running [=find a range from a text directive=] given
- |parsedValues| and |document| is non-null, then [=list/append=] it to
- |ranges|.
- 5. Return |ranges|.
+ 1. Let |ranges| be a list of [=ranges=], initially empty.
+ 1. For each [=text directive=] |directive| of |text directives|:
+ 1. If the result of running [=find a range from a text directive=] given |directive| and
+ |document| is non-null, then [=list/append=] it to |ranges|.
+ 1. Return |ranges|.
Table of Contents
Document.
-Monkeypatching DOM § 4.5 Interface Document:
-Each document has an associated uninvoked directives which is either - null or an ASCII string holding data used by the UA to process the resource. It is initially - null.
+Each document has an associated pending text directives which is either + null or an list of text directives. It is initially null.
In the definition of update document for history step application:
+In the definition of update document for history step application:
-Monkeypatching HTML § 7.4.6.2 Updating the document:
@@ -1459,39 +1457,53 @@If document’s latest entry’s directive state is not entry’s directive state then set document’s uninvoked directives to entry’s directive state's value. -
- Set document’s latest entry to entry
-- ...
++ If document’s latest entry’s directive state is not entry’s directive state then: +++
+- +
Let fragment directive be entry’s directive state's value.
+- +
Set document’s pending text directives to the result of parsing fragment directive.
++ Set document’s latest entry to entry
++ ...
An ASCII string is a valid fragment directive if -it matches the production:
+Note: This section is non-normative.
+Note: This grammar is provided as a convenient reference; however, the rules and steps for parsing +are specified imperatively in the § 3.4 Text Directives section. Where this grammar differs in +behavior from the steps of that section, the steps there are to be taken as the authoritative source +of truth.
+The FragmentDirective can contain multiple directives split by the "&" character. Currently this +means we allow multiple text directives to enable multiple indicated strings in the page, but this +also allows for future directive types to be added and combined. For extensibility, we do not fail +to parse if an unknown directive is in the &-separated list of directives.
+A string is a valid fragment directive if it matches the EBNF (Extended +Backus-Naur Form) production:
FragmentDirective
::=
- (TextDirective | UnknownDirective) ("&" FragmentDirective)?
+ (TextDirective | UnknownDirective) ("&" FragmentDirective)?
+ TextDirective
::=
+ "text="CharacterString
UnknownDirective
::=
- CharacterString
+ CharacterString - TextDirective
CharacterString
::=
- (ExplicitChar | PercentEncodedByte)+
+ (ExplicitChar | PercentEncodedByte)*
ExplicitChar
::=
[a-zA-Z0-9] | "!" | "$" | "'" | "(" | ")" | "*" | "+" | "." | "/" | ":" |
";" | "=" | "?" | "@" | "_" | "~" | "," | "-"
A TextDirective is considered valid if it matches the following production:
TextDirective
::=
+ ValidTextDirective
::=
"text=" TextDirectiveParameters
TextDirectiveParameters
::=
(TextDirectivePrefix ",")? TextDirectiveString ("," TextDirectiveString)? ("," TextDirectiveSuffix)?
@@ -1505,10 +1517,9 @@ [a-zA-Z0-9] | "!" | "$" | "'" | "(" | ")" | "*" | "+" | "." | "/" | ":" |
";" | "=" | "?" | "@" | "_" | "~"
- PercentEncodedByte
::=
"%" [a-zA-Z0-9][a-zA-Z0-9]
If term is null, return null.
+Assert: term is an ASCII string.
+Let decoded bytes be the result of percent-decoding term.
+Return the result of running UTF-8 decode without BOM on decoded +bytes.
+To parse a text directive, on an ASCII string text -directive input, run these steps:
+ To parse a text directive, on an string text + directive value, run these steps:This algorithm takes a single text directive string as input (e.g. - "text=prefix-,foo,bar") and attempts to parse the string into the - components of the directive (e.g. ("prefix", "foo", "bar", null)). See § 3.2 Syntax for the what each of these components means and how they’re - used.
-Returns null if the input is invalid or fails to parse in any way. - Otherwise, returns a text directive.
+This algorithm takes a single text directive value string as input (e.g. "prefix-,foo,bar") and + attempts to parse the string into the components of the directive (e.g. ("prefix", "foo", "bar", + null)). See § 3.2 Syntax for the what each of these components means and how they’re used.
+Returns null if the input is invalid. Otherwise, returns a text directive.
Assert: text directive input matches the production TextDirective.
-Let textDirectiveString be the substring of text directive -input starting at index 5.
-Let prefix, suffix, start, end, each be null.
Let tokens be a list of strings that is the result of splitting textDirectiveString on commas.
+Assert: text directive value is an ASCII string with no code points in the fragment percent-encode set and no instances of +U+0026 (&).
If tokens has size less than 1 or greater than 4, return null.
+Let tokens be a list of strings that result from strictly splitting text directive value on U+002C (,).
If any of tokens’s items are the empty string, return null.
+If tokens has size less than 1 or greater than 4, return null.
Let retVal be a text directive with each of its items initialized -to null.
-Let potential prefix be the first item of tokens.
+If the first item of tokens ends with U+002D (-):
+If the last character of potential prefix is U+002D (-), then:
+If the last item of tokens starts with U+002D (-):
Set retVal’s prefix to the percent-decoding of the result of removing the -last character from potential prefix.
+Set suffix to the substring of the last item of tokens from 1 to the end of the string.
+Remove the last item of tokens.
+If suffix is the empty string or contains any instances of U+002D (-), return null.
Remove the first item of the list tokens.
+If tokens is empty, return null.
Let potential suffix be the last item of tokens, if one exists, null -otherwise.
+If tokens has size greater than 2, return null.
+Set start to the first item in tokens.
+Remove the first item in tokens.
If potential suffix is non-null and its first character is U+002D (-), -then:
+If start is the empty string or contains any instances of U+002D (-), return null.
+If tokens is not empty:
Set retVal’s suffix to the percent-decoding of the result of removing the -first character from potential suffix.
+Set end to the first item in tokens.
Remove the last item of the list tokens.
+If end is the empty string or contains any instances of U+002D (-), return null.
If tokens has size not equal to 1 nor 2 then -return null.
+Return a new text directive, with
+To parse the fragment directive, an an ASCII string fragment +directive, run these steps:
+Set retVal’s start be the percent-decoding of the first item of tokens.
+Let directives be the result of strictly + splitting fragment directive on U+0026 (&).
If tokens has size 2, then set retVal’s end be the percent-decoding of the last item of tokens.
+Let output be an initially empty list of text directives.
Return retVal.
+For each string directive in directives:
+If directive does not start with "text=
", then continue.
Let text directive value be the code point substring from 5 to the end of directive.
+Let parsed text directive be the result of parsing text + directive value.
+If parsed text directive is non-null, append it to output.
+Return output.
This section describes how text directives in a document’s uninvoked directives are +
This section describes how text directives in a document’s pending text directives are processed and invoked to cause indication of the relevant text passages.
Modify the indicated part processing model to try processing uninvoked directives into a range that will be returned as the indicated part.
+Modify the indicated part processing model to try processing pending text directives into a range that will be returned as the indicated part.
Modify "scrolling to a fragment" to correctly scroll and set the Document’s target element in the case of a range based indicated part.
Ensure uninvoked directives is reset to null when the user agent has finished the +
Ensure pending text directives is reset to null when the user agent has finished the fragment search for the current navigation/traversal.
If the user agent finishes searching for a text directive, ensure it tries the regular @@ -1610,13 +1677,13 @@
Let directives be the document’s uninvoked directives.
+Let text directives be the document’s pending text directives.
If directives is non-null then:
+If text directives is non-null then:
Let ranges be a list that is the result of running - the invoke text directives steps with directives and the document.
+Let ranges be a list that is the result of running + the invoke text directives steps with text directives and the document.
If ranges is non-empty, then:
Implementations MAY avoid scrolling to the target if it is - produced from a text directive.
+ produced from a text directive.Run the focusing steps for target, with the Document’s viewport as the fallback target.
@@ -1714,7 +1781,7 @@The next two monkeypatches ensure the user agent clears uninvoked directives when +
The next two monkeypatches ensure the user agent clears pending text directives when the fragment search is complete. In the case where a text directive search finishes because parsing has stopped, it tries one more search for a non-text directive fragment.
In the definition of try to scroll to the fragment:
@@ -1740,7 +1807,7 @@Set uninvoked directives to null.
+Set pending text directives to null.
Abort these steps.
@@ -1750,10 +1817,10 @@If uninvoked directives is not null, then:
+If pending text directives is not null, then:
Set uninvoked directives to null.
+Set pending text directives to null.
Scroll to the fragment given document.
If document’s indicated part is still null, then try to scroll to the fragment for - document. Otherwise, set uninvoked directives to + document. Otherwise, set pending text directives to null.
Scroll to the fragment given navigable’s active document.
-Let traversable be navigable’s traversable navigable.
@@ -1801,9 +1868,9 @@Care must be taken when implementing text directive so that it +
Care must be taken when implementing text directive so that it cannot be used to exfiltrate information across origins. Scripts can navigate a -page to a cross-origin URL with a text directive. If a malicious +page to a cross-origin URL with a text directive. If a malicious actor can determine that the text fragment was successfully found in victim page as a result of such a navigation, they can infer the existence of any text on the page.
@@ -1880,7 +1947,7 @@If document’s uninvoked directives field is null or empty, return false.
+If document’s pending text directives field is null or empty, return false.
Let is user involved be true if: document’s text directive user activation is
true, or user involvement is one of "activation
" or "browser
@@ -2181,7 +2248,7 @@
update document for history step application steps to take a boolean allow text directive scroll and use it when scrolling to a fragment:
Amend the update document for history step application steps to take a boolean allow text directive scroll and use it when scrolling to a fragment:
Monkeypatching [HTML]:
@@ -2213,7 +2280,7 @@apply the history step algorithm to take a boolean allow text directive -scroll and pass it through when calling update document for history step application : +scroll and pass it through when calling update document for history step application :
-Monkeypatching [HTML]:
@@ -2394,7 +2461,7 @@<
Amend the update document for history step application steps +
Amend the update document for history step application steps to check the
force-load-at-top
policy and avoid scrolling in a new document if it’s set.@@ -2458,7 +2525,7 @@<
If a directive successfully matches to text in the document, it returns a range indicating that match in the document. The invoke text directives steps are the high level API provided by this - section. These return a list of ranges that were matched + section. These return a list of ranges that were matched by the individual directive matching steps, in the order the directives were specified in the fragment directive string.
If a directive was not matched, it does not add an item to the returned list.
If text directives is not a valid fragment directive, then -return an empty list.
-Let directives be a list of ASCII strings -that is the result of strictly splitting the -string text directives on "&".
+For each ASCII string directive of directives:
+For each text directive directive of text directives:
If directive does not match the production TextDirective, -then continue.
-Let parsedValues be the result of running the parse a text -directive steps on directive.
-If parsedValues is null then continue.
-If the result of running find a range from a text directive given parsedValues and document is non-null, then append it to ranges.
+If the result of running find a range from a text directive given directive and document is non-null, then append it to ranges.
Return ranges.
Assert: matchRange’s start node is a Text
node.
Assert: matchRange’s start node is a Text
node.
If potentialMatch is null, return null.
If potentialMatch’s start is not matchRange’s start, then continue.
+If potentialMatch’s start is not matchRange’s start, then continue.
Assert: potentialMatch is non-null, not collapsed and +
Assert: potentialMatch is non-null, not collapsed and represents a range exactly containing an instance of matching text.
If parsedValues’s suffix is null, return potentialMatch.
@@ -2695,7 +2745,7 @@Return null
Set range’s start offset to 0.
If the substring data of node at offset offset and count 6 is equal to the string " " then:
@@ -2762,7 +2812,7 @@Set searchRange’s start offset to 0.
If curNode is not a visible text node:
@@ -2807,12 +2857,12 @@Set searchRange’s start offset to 0.
Let blockAncestor be the nearest block ancestor of curNode.
While curNode is a shadow-including descendant of blockAncestor and the position of the boundary point (curNode, 0) is not after searchRange’s end:
@@ -2826,7 +2876,7 @@If curNode is a visible text node then append it to textNodeList.
@@ -2838,7 +2888,7 @@If curNode is null, then break.
Assert: curNode follows searchRange’s start node.
+Assert: curNode follows searchRange’s start node.
Set searchRange’s start to the boundary point (curNode, 0).
@@ -2882,7 +2932,7 @@Text
nodes nodes, and booleans wordStartBounded and wordEndBounded, follow these steps:
+a range searchRange, a list of Text
nodes nodes, and booleans wordStartBounded and wordEndBounded, follow these steps:
Assert: start and end are non-null, valid boundary points in searchRange.
+Assert: start and end are non-null, valid boundary points in searchRange.
Text
nodes nodes, and a boolean isEnd, follow these steps:
+ To get boundary point at index, given an integer index, list of Text
nodes nodes, and a boolean isEnd, follow these steps:
This is a small helper routine used by the steps above to determine which node a given index in the concatenated string belongs to.
@@ -3008,13 +3058,13 @@A locale is a string containing a valid [BCP47] language tag, or the empty string. An empty string indicates that the primary +
A locale is a string containing a valid [BCP47] language tag, or the empty string. An empty string indicates that the primary language is unknown.
-A substring is word bounded in a string text, +
A substring is word bounded in a string text, given locales startLocale and endLocale, if both the position of its first character is at a word boundary given startLocale, and the position after its last character is at a word boundary given endLocale.
-A number position is at a word boundary in a string text, given a locale locale, if, using locale, either a word
+ A number position is at a word boundary in a string text, given a locale locale, if, using locale, either a word
boundary immediately precedes the positionth code unit, or text’s length
is more than 0 and position equals either 0 or text’s length. This section contains recommendations for UAs automatically generating URLs
-with a text directive. These recommendations aren’t normative but
+with a text directive. These recommendations aren’t normative but
are provided to ensure generated URLs result in maximally stable and usable
URLs. Context terms allow the text directive to disambiguate text
+ Context terms allow the text directive to disambiguate text
snippets on a page. However, their use can make the URL more brittle in some
cases. Often, the desired string will start or end at an element boundary. The
context will therefore exist in an adjacent element. Changes to the page
-structure could invalidate the text directive since the context and
+structure could invalidate the text directive since the context and
match text will no longer appear to be adjacent. We could craft the text directive as follows: We could craft the text directive as follows: However, suppose the page changes to add a "[edit]" link beside all section
@@ -3219,11 +3269,11 @@ When the UA navigates to a URL containing a text directive, it will
+ When the UA navigates to a URL containing a text directive, it will
fallback to scrolling into view a regular element-id based fragment if it
exists and the text fragment isn’t found. This can be useful to provide a fallback, in case the text in the document
-changes, invalidating the text directive.4. Generating Text Fragment Directives
4.1. Prefer Exact Matching To Range-based
@@ -3193,18 +3243,18 @@ TODO: Can we determine the above limit in some less arbitrary way?
4.2. Use Context Only When Necessary
- <div class="section">HEADER</div>
<div class="content">Text to quote</div>
- text=HEADER-,Text%20to%20quote
TODO: Determine the numeric limit above in less arbitrary way.
4.3. Determine If Fragment Id Is Needed
-