-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add content scripts section in specification #542
base: main
Are you sure you want to change the base?
Conversation
specification/index.bs
Outdated
|
||
Used to match frames with an opaque or otherwise missing origin. The origin to match against is determined in the following order of priority: | ||
|
||
1. If the frame has an [=opaque origin=], such as with a [=blob URLs=], use the non-opaque origin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rephrase this.
(The issue with the current language is that:
a] it doesn't specify what the "non-opaque" origin is or where it comes from
b] it doesn't always use the origin of the parent; it uses the initiator (or "precursor"))
If the URL of a document has a specified scheme**, the user agent will fall back to the origin of the initiator instead. This is commonly, but not always, the parent or embedding frame.
** In chrome, these schemes are data:, about:, filesystem:, and blob:. Is that the same in other browsers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oliverdunk The semantics have extensively been discussed on Chromium's issue tracker where I and Devlin discussed the API design. If you're interested, the start of the discussion is at https://issues.chromium.org/issues/40443085#comment48. The design that is close to what we have now was sketched in https://issues.chromium.org/issues/40443085#comment61 , with the final name (match_origin_as_fallback
) at https://issues.chromium.org/issues/40443085#comment67. Devlin summarized the discussion at https://issues.chromium.org/issues/40443085#comment71
Upon reviewing the proposed texts here, I think that there is some confusion on terminology. The current text mentions blob URLs as an opaque origin, but that is not the case.
Relevant to content script matching is the URL of the document (which can have an origin component) and the origin of the document (as a security principal). There may not always be an obvious relation between the two:
- URLs may have visible origin parts in it, such as http(s) and also
blob:
and (Chrome-only)filesystem:
(e.g.blob:https://example.com/UUID
). - URLs may not have a visible origin in it (
about:blank
andabout:srcdoc
), but still have a non-opaque origin: commonly the opener of the frame or window is another http(s) URL. Or even any number ofabout:blank
/srcdoc
documents where the first was initially opened by a http(s) origin. - The security principal of a document can be an opaque origin, even if the URL of that document looks like it has a non-opaque origin. This happens with
<iframe sandbox>
orsandbox
directive in theContent-Security-Policy
. A content script can usewindow.origin
to see whether the origin is opaque, as it would serialize to"null"
.- In case of opaque origins, there is almost always a non-opaque initiator that opened the frame. The term "precursor origin" is used here
<iframe sandbox="allow-scripts" src="https://example.com">
- In case of opaque origins, there is almost always a non-opaque initiator that opened the frame. The term "precursor origin" is used here
- The exception is when the initiator of the navigation does not have a non-opaque origin. For example, when the user navigates to a
data:-URL
or toabout:blank
. Sincedata:
-URL- Chrome does currently not run content scripts in these documents.
- Firefox currently allows content scripts with
matches
for all URLs ANDmatch_about_blank: true
to run scripts in top-level about:blank. This is not documented anywhere though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both, writing the description for these two keys has taken by far the most time in this PR. I've given it another attempt and would appreciate any feedback.
As a general note, concepts like the precursor origin and security principal don't appear to be defined in any other specifications. It seems like they are more informal terms used often in implementations and by implementors. With that in mind, I've tried to describe them as best as possible without talking about them by name.
A few additional notes:
- I've added an informal note to
match_about_blank
describing the Firefox behavior for top-level about:blank pages. - I've added a note that the path must be a wildcard if
match_origin_as_fallback
is set. This is the behavior today in Chrome. Interestingly, we don't have any restrictions oninclude_globs
orexclude_globs
. This feels like an omission to me and I wonder if we should specify something. - In Chrome, sandboxing doesn't seem to be relevant. We always apply these fallbacks, even if the parent is inaccessible to the child frame. With that in mind, I haven't mentioned it here.
Clearly there's a lot of detail here so please let me know if I've missed anything or it could be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Oliver! I had a chance to take a quick pass on this one.
specification/index.bs
Outdated
|
||
#### Key `match_about_blank` | ||
|
||
If this is `true`, the content script will also be injected into an additional user agent specified set of pages used to represent empty frames. This will only happen if the content script matches the page that embedded the frame. Defaults to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we (browsers) have different criteria for match_about_blank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description here is too vague. match_about_blank
was designed for about:blank
and about:srcdoc
.
If you're looking for clarity, see https://stackoverflow.com/questions/41408936/can-anyone-explain-that-what-is-the-use-of-match-about-blank-in-chrome-extensi, where I previously posted an answer that describes why match_about_blank
exists and what it does.
Other documentation:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rob--W, could you take another look? I've made some tweaks although it's unclear to me what was too vague.
specification/index.bs
Outdated
|
||
Used to match frames with an opaque or otherwise missing origin. The origin to match against is determined in the following order of priority: | ||
|
||
1. If the frame has an [=opaque origin=], such as with a [=blob URLs=], use the non-opaque origin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rephrase this.
(The issue with the current language is that:
a] it doesn't specify what the "non-opaque" origin is or where it comes from
b] it doesn't always use the origin of the parent; it uses the initiator (or "precursor"))
If the URL of a document has a specified scheme**, the user agent will fall back to the origin of the initiator instead. This is commonly, but not always, the parent or embedding frame.
** In chrome, these schemes are data:, about:, filesystem:, and blob:. Is that the same in other browsers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the extra review. I didn't remember that I had looked at this already until I got to my first comment halfway through it.
specification/index.bs
Outdated
|
||
Used to match frames with an opaque or otherwise missing origin. The origin to match against is determined in the following order of priority: | ||
|
||
1. If the frame has an [=opaque origin=], such as with a [=blob URLs=], use the non-opaque origin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although not obvious from Github's UI, I just posted additional context on
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all! I've just done another pass on this, would appreciate any additional feedback.
specification/index.bs
Outdated
|
||
Used to match frames with an opaque or otherwise missing origin. The origin to match against is determined in the following order of priority: | ||
|
||
1. If the frame has an [=opaque origin=], such as with a [=blob URLs=], use the non-opaque origin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both, writing the description for these two keys has taken by far the most time in this PR. I've given it another attempt and would appreciate any feedback.
As a general note, concepts like the precursor origin and security principal don't appear to be defined in any other specifications. It seems like they are more informal terms used often in implementations and by implementors. With that in mind, I've tried to describe them as best as possible without talking about them by name.
A few additional notes:
- I've added an informal note to
match_about_blank
describing the Firefox behavior for top-level about:blank pages. - I've added a note that the path must be a wildcard if
match_origin_as_fallback
is set. This is the behavior today in Chrome. Interestingly, we don't have any restrictions oninclude_globs
orexclude_globs
. This feels like an omission to me and I wonder if we should specify something. - In Chrome, sandboxing doesn't seem to be relevant. We always apply these fallbacks, even if the parent is inaccessible to the child frame. With that in mind, I haven't mentioned it here.
Clearly there's a lot of detail here so please let me know if I've missed anything or it could be clearer.
|
||
### Key `match_about_blank` | ||
|
||
If this is true, use the URL of the parent frame when matching a child frame whose document URL has the `about` [=scheme=]. See also [[#determine-the-url-for-content-script-matching]]. Defaults to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chrome currently matches all documents with an about
scheme as described in the current copy (source link care of @oliverdunk), but Firefox appears to explicitly check against about:blank
and about:srcdoc
(source). @xeenon, how does Safari handle this key?
Any changes here should be reflected in the algorithm "Determine the URL for content script matching" section below.
|
||
If this is true, use the URL of the parent frame when matching a child frame whose document URL has the `about` [=scheme=]. See also [[#determine-the-url-for-content-script-matching]]. Defaults to `false`. | ||
|
||
Note: In Firefox, setting `match_about_blank` to `true` also allows injection into top-level `about:blank` pages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a non-normative label to this block.
|
||
### Key `run_at` | ||
|
||
Specifies when the content script should be injected. Valid values are `document_start`, `document_end` and `document_idle`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a WebIDL definition for these values.
|
||
### Key `world` | ||
|
||
The [=world=] any JavaScript scripts should be injected into. Defaults to `ISOLATED`. Valid values are `MAIN` and `ISOLATED`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a WebIDL definition for these values.
|
||
### Key `include_globs` | ||
|
||
A list of [=globs=] that a page should match. A page matches if the URL matches both the [[#key-matches]] field and the [[#key-include_globs]] field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably revise "a page should match" to a more generic concept like "potential injection context." Words are hard. @oliverdunk suggested maybe "document"?
1. Let |url| be the document's URL. | ||
1. If the document is within a child frame: | ||
1. If the [=scheme=] of the document's URL is `about`, and `match_about_blank` or `match_origin_as_fallback` is set to true: | ||
1. Set |url| to a URL based on the origin of the parent frame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "URL based on the origin of the parent frame" feels a little … awkward? Definitely not required, but maybe we should have a definition or abstract algorithm that describes this and link out to it from here.
1. Let |url| be the result of running [[#determine-the-url-for-content-script-matching]]. | ||
1. If the extension does not have access to the origin, return. | ||
1. If |url| is not matched by a match pattern in `matches`, return. | ||
1. If `include_globs` is present and |url| is not matched by any pattern, return. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be a bit easier for the reader to parse.
1. If `include_globs` is present and |url| is not matched by any pattern, return. | |
1. If `include_globs` is present and |url| is not matched by any glob pattern, return. |
1. If |url| is not matched by a match pattern in `matches`, return. | ||
1. If `include_globs` is present and |url| is not matched by any pattern, return. | ||
1. If |url| matches an entry in `exclude_matches` or `exclude_globs`, return. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change needed.
On my first pass through this I assumed that we could simplify the language here by combining steps 2 and 3 (similar to how step 4 is written), but @oliverdunk pointed out thatinclude_globs
behaves as I expected. He also pointed out that the user scripts behavior of includeGlobs
intentional diverges from content scripts – the the proposal notes that globs are:
// Implemented as disjunction: runs in documents whose URL matches
// "matches" or "includeGlobs", and not "excludeMatches" nor "excludeGlobs".
Adds a first draft of information on content scripts in the specification.
There are still some updates needed, in particular around the algorithm for deciding when to inject a script, but I wanted to open something to get some early feedback.
Preview | Diff