Some applications wish to concurrently capture multiple surfaces.
Capturing multiple surfaces is doable using existing APIs - it is possible to call getDisplayMedia()
multiple times. However, this is not very ergonomic, and creates serious friction for the user:
- The user has to interact with the browser's media-picker multiple times.
- The user has to interact with the application multiple times, signaling that they want to capture yet another surface, and providing a new transient activation each time.
- The user is liable to make mistakes when trying to remember which surfaces they've already started capturing, and which surfaces remain for them to capture.
Ideally, a single transient activation could be used for single API invocation, providing the user with a media-picker with functionality akin to checkboxes (mentioned here by way of example; we don't need to mandate specific UX elements). The user would be allowed to choose all of the display surfaces that they want to capture, then click OK once. It is clear from context that these are all of e surfaces the user was aiming to capture, and that no additional API calls to gDM or the like are necessary.
Consider an instructor presenting multiple tabs to several students.
- Instructor streams multiple tabs to an SFU.
- Individual students independently choose tab to view at any given moment.
With a single click, the instructor can start capturing all the relevant tabs.
Video conferencing software asks the user to choose all the tabs the user wishes to share. The application captures all surfaces, but, at any given moment, it only relays to the SFU a single tab. Which tab is relayed depends on app-specific logic. (For instance, maybe only the last-active tab.)
Recording for compliance/training/billing reasons.
Record multiple windows. Redraw them to a canvas to produce a video of a virtual desktop. This virtual desktop only has the captured windows, which improves privacy a lot over what users currently do nowadays - sharing the entire (real) desktop so as to share a handful of windows. (Additional, orthogonal API for learning the position and size of windows needed to make this truly powerful.)
Provide an API which allows multiple screen-captures to be initiated. It should only require a single transient activation. Ideally, the user agent should present to the user a UX which would render certain user-mistakes impossible (e.g. capturing the same surface multiple times).
partial interface MediaDevices {
Promise<sequence<MediaStream>> getDisplayMediaSet(
optional MediaStreamConstraints constraints = {});
}
Add a possible paramter to getDisplayMedia
called maxSurfaces
.
Its default value is 1. With that value, the existing behavior is manifested.
For values greather than 1, the new behavior is manifested (multi-picker), and the return type changes to Promise<sequence<MediaStream>>
.
See mock above.