Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Captured Surface Control #962

Open
1 task done
eladalon1983 opened this issue Jun 4, 2024 · 7 comments
Open
1 task done

Captured Surface Control #962

eladalon1983 opened this issue Jun 4, 2024 · 7 comments
Assignees
Labels
Focus: API design (pending) Focus: Security (pending) Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Venue: WebRTC WebRTC and media capture

Comments

@eladalon1983
Copy link

Uryyb GNT! V nz n ohqqvat pelcgbtencul rkcreg.

I'm requesting a TAG review of Captured Surface Control.

Summary

We introduce a new Web API that allows Web applications to:

  1. Read and write the zoom level of a captured display surface (tab or window).
  2. Produce wheel events in a captured tab or window.

Details

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): Screen Capture Community Group and WebRTC Working Group
  • The group where standardization of this work is intended to be done: WebRTC Working Group
  • Existing major pieces of multi-implementer review or discussion of this design: https://www.w3.org/2024/05/21-webrtc-minutes.html
  • Major unresolved issues with or opposition to this design: N/A
  • This work is being funded by: Google
@martinthomson
Copy link
Contributor

@jyasskin, @hober, and I discussed this today.

Thank you for bringing this to us. We think this seems like a generally useful feature, but we have some questions and suggestions for the explainer:

The explainer should discuss the alternative design of having the page cooperate, and accept dedicated events from the capturing process. We think there are both upsides and downsides to that option that deserve exploration.

The two interactions that are considered are scrolling and zooming. Is that list exhaustive? Are these uniformly safe to do? Are there not occasions where scrolling results in changes to things like form elements? That could require a change of focus before sending the events in, maybe, though with precise X and Y on events, that might still engage the element that is targeted. We're inferring that this is limited to those two actions because "spoofing" those events is safe, but the explainer doesn't give enough details to show that that's true.

There seems to be some heightened permissions UX being contemplated here. It's not clear to us what would be different from a regular screen capture. It would be helpful if the explainer could show a proof of concept that highlights those differences.

@martinthomson martinthomson added the Progress: pending editor update TAG is waiting for a spec/explainer update label Sep 3, 2024
@torgo torgo modified the milestones: 2024-09-02-week, 2024-09-09-week Sep 5, 2024
@plinss plinss removed this from the 2024-09-09-week milestone Sep 16, 2024
@torgo torgo added this to the 2024-10-07-week milestone Oct 4, 2024
@eladalon1983
Copy link
Author

Apologies for taking some time here. I'll respond soon.

@plinss plinss removed this from the 2024-10-14-week milestone Oct 21, 2024
@eladalon1983
Copy link
Author

eladalon1983 commented Oct 22, 2024

We think this seems like a generally useful feature

That's great to hear!

The explainer should discuss the alternative design of having the page cooperate, and accept dedicated events from the capturing process. We think there are both upsides and downsides to that option that deserve exploration.

I have now added a discussion select alternatives to the explainer.

The two interactions that are considered are scrolling and zooming. Is that list exhaustive?

For the time being - yes.

Apple's represenative, Youenn, suggested adding pinch. No Web developers have requested this feature, so we are leaving this as a potential extension. But note that the current API shape does not prevent such future extensions.

  • We could, in the future, define forwardPinch(element).
  • We could, in the future, transition to forwardGestures(element, gestures), where the second argument is a dictionary of relevant gestures.
  • Other API shapes would be possible.

Note: We intentionally exclude any interaction like clicking, delivering keystrokes, etc. We have no plans of ever extending the API to cover such gestures.

Are these uniformly safe to do? Are there not occasions where scrolling results in changes to things like form elements?

Web applications can attach any meaning to any user action, and that property is desirable and necessary to retain - the user expects scrolling to work identically when delivered from the capturing application; always, not just when it's a simple scroll. A concrete example is Google Maps, where scrolling results in change the region of the map being displayed, triggering the fetching of new assets, etc. Or think how Apple's main page often uses fancy animations of laptops opening and closing when scrolling.

We believe that this risk is sufficiently mitigated by the (1) pre-existing safeguards associated with screen-sharing to begin with, by (2) the additional permission prompt involved, and (3) by the steps taken to ensure that only the user's immediate interaction with the capturing application can trigger scroll-forwarding to the captured application.

That could require a change of focus before sending the events in

Mandating change of focus could break the experience for the user and subvert their expectation, that the scroll delivered on the capturing application's preview tile, would end up eliciting the exact same behavior on the captured surface, as though it were delivered directly there.

There seems to be some heightened permissions UX being contemplated here. It's not clear to us what would be different from a regular screen capture. It would be helpful if the explainer could show a proof of concept that highlights those differences.

When users are currently asked to grant permission to capture a tab/window/screen, they are used to a specific interpretation. Before elevating this permission to something new - capture plus scroll plus zoom - an additional prompt is required. User agents are free to infer this heightened permission using any heuristic, and may change that based on how user expectations evolve over time. For the time being, Chrome intends to use a run of the mill permission prompt, and to use some extra UX to clarify to the user that this permission is active. This is neither mandated by the spec, nor do we guarantee that Chrome will retain this particular UX.

@torgo torgo added this to the 2024-10-28-week milestone Oct 25, 2024
@matatk
Copy link

matatk commented Oct 30, 2024

Thank you for your reply and all the info in the Explainer. We discussed this on our breakout today.
We still feel the explainer needs more information on possible abuse cases and a bit more discussion of attack surface. The security considerations talks about potential confusion, but doesn't talk about how the API could be abused by bad actors. So we recommend a security analysis (and there is a W3C process spinning up for this) but in the mean time if you could bolster the current security considerations doc with some discussion of abuse cases and mitigations that would great.
As there's a lot going on UI-wise here, we'd really like to see an 'Accessibility considerations' section in the Explainer (it's totally fine to use this section to show what the positives are) - please could you add one? Please also consider requesting a review from the APA WG: https://github.com/w3c/a11y-request/issues/new/choose

@eladalon1983
Copy link
Author

eladalon1983 commented Oct 30, 2024

We still feel the explainer needs more information on possible abuse cases and a bit more discussion of attack surface.

I have now added a "Security and Privacy Considerations" section in the explainer. It simply links to the corresponding section in the spec, where this information actually lives, so as to avoid duplication.

but in the mean time if you could bolster the current security considerations doc [Emphasis mine - Elad.]

Do I understand correctly, that you are asking for the information already in the spec (this section) to be replicated in questionnaire.md? I think it would be better to go with linking; maybe from section 2.18 to the spec's "Security and Privacy Considerations" section. Wdyt?

As there's a lot going on UI-wise here

Could you please clarify which UI changes you are referring to? As far as I can tell, this spec does not deal with anything UX-related. Although bespoke user agent UX associated with these APIs is possible, this is completely up to the UA's discretion; a spec-compliant implementation is possible even without any additional user agent UX.

To clarify, this mock is of the Web application's possible UX, not the user agent's.

@jyasskin
Copy link
Contributor

FWIW, I don't think you should duplicate any information into https://github.com/screen-share/captured-surface-control/blob/main/questionnaire.md. Instead, questionnaire.md should include links to the places in the specification that answer the questions. We should improve the questionnaire and template to say that. I wasn't in the relevant breakout, so I don't want to comment on the other questions.

@plinss plinss removed this from the 2024-10-28-week milestone Nov 4, 2024
@jyasskin jyasskin added Progress: propose closing we think it should be closed but are waiting on some feedback or consensus and removed Progress: pending editor update TAG is waiting for a spec/explainer update labels Nov 5, 2024
@plinss plinss added this to the 2024-11-18-week milestone Nov 5, 2024
@matatk
Copy link

matatk commented Nov 20, 2024

@eladalon1983:

Could you please clarify which UI changes you are referring to? As far as I can tell, this spec does not deal with anything UX-related. Although bespoke user agent UX associated with these APIs is possible, this is completely up to the UA's discretion; a spec-compliant implementation is possible even without any additional user agent UX.

To clarify, this mock is of the Web application's possible UX, not the user agent's.

Totally agree that we generally aim to avoid specifying UI/UX, and ACK that the UI in the example is from the app (and, of course, UI is already covered by WCAG - though I'll come back to that). Let me hopefully clarify...

Whilst a spec may be for a low-level API, products built with the API are often user-facing. Developers building things with the API may not imagine some of the ways users could be using them; it can be helpful to raise awareness of the opportunities, and any risks, and makes sense to do that in the spec itself.

A concrete and helpful example of some big accessibility wins, and some patterns to avoid, can be found in the Compute Pressure API's Accessibility Considerations section. This example is great because it shows how the API can affect users (UI decisions being made based on its output), how this can help users, and also the importance of meeting, but thinking beyond WCAG in a particular domain.

In the case of Captured Surface Control, there is a new avenue through which to interact with the preview, and a new avenue to scroll and zoom the target tab. As an extensive sample of one vision-impaired people, this seems like a helpful thing to me :-). I am not 100% sure how/if focus considerations would come into play (focusing the PiP window is likely out of scope, but would you expect there to be interactive controls floating within it?) It'd be great to read your thoughts on this in the explainer.

APA WG would be happy to follow the development of this API—please consider requesting a review, or tagging APA WG via the "a11y-tracker" label in any issue where you think some input may be of help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Focus: API design (pending) Focus: Security (pending) Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Venue: WebRTC WebRTC and media capture
Projects
None yet
Development

No branches or pull requests

7 participants