Solve user agent camera/microphone double-mute #39

jan-ivar · 2021-10-27T23:12:48Z

User agent mute-toggles for camera & mic can be useful, yielding enhanced privacy (no need to trust site), and quick access (a sneeze coming on, or a family member walking into frame?)

Safari has pause/resume in its URL bar
Firefox has global cam/mic mute toggles behind a pref (set privacy.webrtc.globalMuteToggles in about:config)
Chrome has recently opened an issue discussing it.

It's behind a pref in Firefox because:

The double-mute problem: site-mute + ua-mute = 4 states, where 3 produce no sound ("Can you hear me now?")
UA-mute of microphone interferes with "Are you talking?" features
Some sites (Meet) stop camera to work around crbug 642785 in Chrome, so there's no video track to UA-mute

This image is titled: "Am I muted?"

This issue is only about (1) the double-mute problem.

We determined we can only solve the double-mute problem by involving the site, which requires standardization.

The idea is:

If the UA mutes or unmutes, the site should update its button to match.
If the user unmutes using the site's button, the UA should unmute(!)

The first point requires no spec change: sites can listen to the mute and unmute events on the track (but they don't).

The second point is key: if the user sees the site's button turn to "muted", they'll expect to be able to click it to unmute.

This is where it gets tricky, because we don't want to allow sites to unmute themselves at will, as this defeats any privacy benefits.

The proposal here is:

partial interface MediaStreamTrack {
  undefined unmute();
}

It would throw InvalidStateError unless it has transient activation, is fully active, and has focus. User agents may also throw NotAllowedError for any reason, but if they don't then they must unmute the track (which will fire the unmute event).

This should let user agents that wish to develop UX without the double-mute problem.

The text was updated successfully, but these errors were encountered:

eladalon1983 · 2021-10-29T15:32:49Z

Repeating (and rephrasing) myself from Chromium bug since it'd be unreasonable to expect the audiences to be identical:

Media-controls exposed by the browser, which allows an ongoing mic/camera/screen-capture to be muted by the user, communicates an implicit promise from the browser to the user. If the application is allowed to override that promise, it's allowed to break that promise.

I understand the double-mute problem and hope that there could be other ways to resolve it. For example, maybe if the application tries to unmute the track, the browser could show the user a prompt to approve that. This would mean that a user can still click the red mic button to unmute, but because additional approval is required, the application cannot unmute unilaterally in response to an arbitrary user gesture.

jan-ivar · 2021-10-30T02:35:19Z

That's a good idea. Having the method be asynchronous would allow this.

partial interface MediaStreamTrack {
  Promise<undefined> unmute();
}

The goal of a spec here is to allow innovation in this space, without arriving at specific UX. It could be a prompt, or maybe a toast message is enough.

I think a lot of users would be surprised to learn that when they mute microphone or camera on a web site today, they have zero assurances that it actually happens. Well-behaved websites have lulled most users into feeling in control when they're not. Most probably don't consider that the website may turn camera or microphone back on at any time as long as the page is open.

The page doesn't even need to be open anymore: a different (origin) accomplice page can reopen/navigate to it without user interaction at a later point, since gUM doesn't require transient activation.

We dropped the ball on transient activation in gUM. Having better UA-integrated muting with transient activation might give us a second chance to rectify some privacy concerns.

For instance, a UA could choose to mute for the user if it detects a page accessing camera or microphone on pageload or without interaction.

youennf · 2022-09-12T21:41:00Z

Promise<undefined> unmute();

unmute() method seems fine.
I wonder whether we should not try to introduce a mute() method as well.
This would allow the website to mute itself and have its UI synchronised with OS/UA UI.

Also, in Safari, the mute/unmute is page wide, which means that all microphone tracks are either muted or unmuted.
This does not align particularly well with unmute being at the track level.
Maybe introducing unmute/mute at navigator.mediaDevices level would be good enough?

jan-ivar · 2023-04-18T20:20:01Z

I wonder whether we should not try to introduce a mute() method as well.

I fear this would create confusion and a false symmetry suggesting muting is under application control when it is not.

This would allow the website to mute itself and have its UI synchronised with OS/UA UI.

Applications have track.enabled for this. UAs "MAY" turn off indicators when this "brings all tracks connected to the device to be ... disabled," which Firefox has done since 2018. This is crbug 642785, I dunno if webkit has one.

Here's a fiddle for toggling multiple tracks demonstrating Firefox updating its camera URL bar indicator and OS/hardware light (modulo bug 1694304 on Windows for the hardware light), whenever the number of enabled tracks goes to zero.

I suggest leaving mute to UAs and concentrating on how apps can signal interest to unmute, to solve the issue at hand.

Also, in Safari, the mute/unmute is page wide,

Do you mean page wide or document wide? What about iframes?

... which means that all microphone tracks are either muted or unmuted. This does not align particularly well with unmute being at the track level.

Maybe we could use constraints? 🙂

Maybe introducing unmute/mute at navigator.mediaDevices level would be good enough?

Maybe, except muted is a property of the track, not mediaDevices.

jan-ivar · 2023-04-18T20:31:16Z

There's also the issue of multiple cameras. If we end up with navigator.mediaDevices.unmute(track) that would really suck.

jan-ivar · 2023-04-18T22:38:53Z

A simplifying factor is that UA muting is 100% about privacy, and as soon as one track is unmuted on a page, then there's no more privacy. So per-track mute would serve no purpose, and make for a terrible API:

await Promise.all(applicationKeepsTrackOfAllTracksItIsUsing.map(track => track.unmute())); // ugh

But with that understanding (of UA mute as a privacy feature), it seems POLA for track.unmute() to unmute all tracks of the same source per document or per top-level document.

So I think I agree mute is a property of the source by that definition.

But there can be multiple sources in navigator.mediaDevices.[[mediaStreamTrackSources]], and cameras can be pointed different ways, so it's not inconceivable that a UA may wish to control privacy per camera.

Even if we don't care about that, we'd need navigator.mediaDevices.unmute(kind) which seems unappealing. I'd rather go with track.unmute().

dontcallmedom-bot · 2023-04-19T14:26:55Z

This issue had an associated resolution in WebRTC WG 2023-04-18 – (Issue 39: Solve user agent camera/microphone double-mute):

RESOLUTION: No objections.

youennf · 2023-04-20T12:20:53Z

track.unmute() makes sense if we think this is useful for all source types.
For WebRTC or canvas tracks, this does not make sense.
For screen sources, I am unclear yet whether we will want to mute all tracks (video and audio) or independently, CaptureController is the object representing the source, as such, the mute\unmute functionality could be placed there.

For capture tracks, InputDeviceInfo is what is closest to the source, hence why I was mentioning InputDeviceInfo.unmute as a possibility.
Another difference to look at is that MediaStreamTrack is transferable, not InputDeviceInfo, we should consider this (though InputDeviceInfo can of course be made transferable in the future).

Page/document mute scope is probably covering 90% of the cases at least and is simpler to implement.
But I feel muting at the source level is better in general and UA can always mute all sources if one gets muted.

I fear this would create confusion and a false symmetry suggesting muting is under application control when it is not.

unmute already introduces this potential confusion about who has control, hence the requestMute/requestUnmute name.

Applications have track.enabled for this. UAs "MAY" turn off indicators when this "brings all tracks connected to the device to be ... disabled," which Firefox has done since 2018. This is crbug 642785, I dunno if webkit has one.

This is a MAY though. Setting all tracks of the same source with enabled = false does not mean track.muted will switch to true, this is left to UA which does not seem great for interop. In Firefox mode, I would guess that muted would be set to true later on, when the application sets enabled = true on one of the track, in which case the application will then have to call unmute. This is not simple.

Looking at Safari, let's say that Safari would update its muted icon when all tracks are enabled = false.
It would then need to immediately set muted = true to these tracks.
Let's say user then clicks on Safari UI, all tracks will have muted = false, but it is then up to the application to register the muted event and do as if the user clicked on one of its own icon. Not simple again.

Looking at Web Applications, they tend to have clone tracks in window environment (local rendering and PC, potential different sizes as well), in the future in workers as well (for encoding/networking) or other windows (via transfer). Having to set enabled to false for each one of these objects, including transferred tracks, is cumbersome and potentially error prone.

Looking at OS support, kAUVoiceIOProperty_MutedSpeechActivityEventListener and kAUVoiceIOProperty_MuteOutput are potentially useful to implement the "Are you talking UI" in the screenshot you added above.
It seems a worthwhile API addition we could consider in the future: if mic is muted, we will allow you to be notified that user might be speaking.

Overall, it seems cleaner to me to have two separate APIs:

enabled which is about stopping to use the information provided by the source and is internal to the web application
muted which is about stopping to receive the information and is tied to UA UI.

youennf · 2023-04-20T13:47:52Z

Alternative to InputDeviceInfo is navigator.mediaDevices.requestUnmute(deviceId)

jan-ivar · 2023-04-21T02:00:42Z

track.unmute() makes sense if we think this is useful for all source types.

It was a WG design choice to reuse MST for other sources to avoid inheritance. The cost of that is living with the fact that not all sources have all track abilities, NOT that tracks only have abilities shared by all sources.

getUserMedia returns camera and microphone tracks, so adding attributes, methods and constraints specific to camera and microphone should be fine. If it's not, then time to split inheritance.

E.g. track.muted only makes sense for camera and microphone, and track.unmute() fits with that.

Other sources do not have the double-mute problem, so to not complicate discussion, let's not discuss them here.

This is a MAY though.

Here's a SHOULD: "When a "live", unmuted, and enabled track sourced by a device exposed by getUserMedia() becomes either muted or disabled, and this brings all tracks connected to the device (across all navigables the user agent operates) to be either muted, disabled, or stopped, then the UA SHOULD relinquish the device within 3 seconds..."

Setting all tracks of the same source with enabled = false does not mean track.muted will switch to true, this is left to UA which does not seem great for interop.

Why would a UA mute an in-view application just because it disabled its tracks? That would be terrible for web compat.

To clarify, Mozilla needs no spec changes to solve turning off privacy indicators or camera light. Our view is the path to interop there is changing the MAY and SHOULD to MUST. But please let's discuss that in a separate issue.

In Firefox mode, I would guess that muted would be set to true later on, when the application sets enabled = true on one of the track, in which case the application will then have to call unmute. This is not simple.

No that is not our plan. As explained in the OP we have a privacy.webrtc.globalMuteToggles pref in about:config which turns on the global user-facing mute controls shown. , and we want to arm sites with tools to unmute themselves better to prepare for UA features like this.

Sorry for any misunderstanding, but it's not my intent to standardize UA muting here, only application-induced unmuting. Muting remains up to user agents, and I think it is important for privacy that they be allowed to continue to own that problem.

The scope of the proposal in the OP (and this issue) was to arm applications with tools to unmute themselves IF the user agent mutes them, not define when user agents mute them.

jan-ivar · 2023-04-21T02:05:31Z

unmute already introduces this potential confusion about who has control, hence the requestMute/requestUnmute name.

We have getUserMedia, not requestUserMedia, and that doesn't seem to confuse anybody.

NotAllowedError seems clear about who has control.

jan-ivar · 2023-05-03T13:46:53Z

Here's a SHOULD: "When a "live", unmuted, and enabled track sourced by a device exposed by getUserMedia() becomes either muted or disabled, and this brings all tracks connected to the device (across all navigables the user agent operates) to be either muted, disabled, or stopped, then the UA SHOULD relinquish the device within 3 seconds..."

FYI this was recently fixed upstream in https://webrtc-review.googlesource.com/c/src/+/302200

guidou · 2023-11-10T10:31:15Z

I'd like to revive this discussion, since these types of system-level controls (either at the UA or the OS) are becoming more common and we have observed that they create a lot of confusion for users.
Like @jan-ivar says, there are two issues here:

If the UA mutes or unmutes, the site should update its button to match.
If the user unmutes using the site's button, the UA should unmute(!)

However, I disagree with this statement:

The first point requires no spec change: sites can listen to the mute and unmute events on the track (but they don't)

The reason sites don't listen to the mute and unmute event is that mute and unmute can be triggered by other causes and if the application cannot know that those events (and themuted attribute) are caused by the double-mute problem, it cannot react appropriately. The spec says muted means live samples are not made available to the MediaStreamTrack, which is not specific to UA/OS-level mute controls. In Chrome specifically, muted means a track is not getting frames for any reason (and system-level mute has never been one of those reasons in practice). IIRC, Safari has similarities with Chrome in this regard.

This has become a major problem for VC applications and I think we need to solve it properly.
I think we can iterate on several of the proposals made in this thread, which look like very promising IMO.

cc @eladalon1983

eladalon1983 · 2023-11-10T15:19:10Z

The problem as I see it is that users can mute through multiple sources - app, UA, OS and hardware. The propagation of state through these layers is presently incomplete - an opportunity for us to earn our keep.

In the high-level, I think we have to provide two mechanisms:

Sites listen for mute-status changes from upstream sources (UA, OS and hardware, in that order).
Sites control mute-status in upstream sources.

1. Listen

The principle here should be fairly uncontroversial.
For the concrete mechanism, I agree with Guido that mute events are not currently well-suited. Either of the following would work for me:

Bespoke events.
Revive the idea of a MuteCause/MuteReason, so that the same mechanism would service both this issue as well as similar ones (see link).

I prefer no2. To start the ball rolling on a concerete proposal:

enum MuteCause {
  "unspecified",  // Catch-all default.
  "operating-system-choice",
  "user-agent-choice",
  // Extensible to hardware, and to -issue if it's not a -choice.
};

interface MuteEvent : Event {
  /* Exercise for the reader */
};

partial interface MediaStreamTrack {
  // Note that multiple causes might apply concurrently.
  readonly attribute sequence<MuteCause> causes;
};

2. Control

There's some potential for controversy here, but I think we can resolve it.

Jan-Ivar proposed:

If the user unmutes using the site's button, the UA should unmute(!)

While I'm sure VC apps would be delighted to have such control, I am afraid that no security/privacy department in any user agent would ever approve it (unless we add some UA-prompt; foreshadowing). Jan-Ivar suggested transient activation and focus as necessary gating mechanisms. These are fine requirements, but they are not sufficient as any transient activation would look identical to the user agent here, possibly subverting the user's actual intentions if they clicked on a mislabelled button. I'd suggest requiring also a PEPC-like prompt. Reasonable app code would then look something like this:

unmuteButton.addEventListener('click', "unmuteClicked();");

async function unmuteClicked() {
  // If necessary, prompt the user to unmute at UA-level etc.
  if (upstreamUnmute) {
    try {
      await track.unmute();
    } catch (error) {
      return;
    }
  }

  // Proceed with the "normal" unmuting in the app.
  // * Resume remote transmission of media.
  // * Change UX to reflect that clicking the button now means "mute".
  // * Update internal state.
}

youennf · 2023-11-10T15:54:35Z

I am not sure how much we need a mute reason. Distinct requestUnmute failures might be sufficient.

eladalon1983 · 2023-11-10T15:59:13Z

You need a MuteReason because mute can happen for inactionable reasons too, like the source not having any new frames to deliver.
Applications might not wish requestUnmute() themselves, but rather provide some reminder/hint to the user about where they muted from (either browser or operating system), and leave it to the user to unmute if they wish.

youennf · 2023-11-10T17:03:51Z

The mute reasons may vary within the muted period and firing mute events when only the reason is changing is not really appealing.
The flow of user trying to unmute and app providing the hint based on the failure seems to cover most cases (and it does not preclude adding reasons in the future if we discover this is useful).

guidou · 2023-11-10T19:10:29Z

The mute reasons may vary within the muted period and firing mute events when only the reason is changing is not really appealing.

I'm OK with the mute event only firing when the muted attribute changes (not the muted reason attribute). WDYT?
The main point iis that the muted attribute in its current form is not enough to solve this problem.
Having a reason looks like a good way to make the muted attribute useful to solve this problem. Otherwise, we need a new attribute or a different API surface.

The flow of user trying to unmute and app providing the hint based on the failure seems to cover most cases (and it does not preclude adding reasons in the future if we discover this is useful).

Does this mean you support the approach of having an attribute for the mute cause?

eladalon1983 · 2023-11-13T09:07:18Z

The mute reasons may vary within the muted period and firing mute events when only the reason is changing is not really appealing.

Why is it not appealing to fire a mute event when the set of reasons changes? (Note that we have a separate unmute event already, btw.)

youennf · 2023-11-13T09:41:42Z

Does this mean you support the approach of having an attribute for the mute cause?

I see this as a potential improvement while I see an API to request unmute as a blocker.
I would focus on the unblocking API.

I also think that an API to request capture to be muted would be useful. The current approach (use enabled=false on all tracks of a device), is a bit tedious and might face backward compatibility issues.

guidou · 2023-11-13T12:53:04Z

I see this as a potential improvement while I see an API to request unmute as a blocker.
I would focus on the unblocking API.

I agree. Let's focus on that first.

I also think that an API to request capture to be muted would be useful. The current approach (use enabled=false on all tracks of a device), is a bit tedious and might face backward compatibility issues.

Also agree. This requires some more thinking because system-level muting is not necessarily equivalent to muting a source or a set of tracks.

eladalon1983 · 2023-11-13T13:01:36Z

I'm having some trouble parsing the last few messages on this thread. If we're all in agreement that we want to add an API exposing the OS-mute-state, then I'll gladly present something in the next available interim opportunity. Specifically, I'd like to present my proposal here. Before I do - @youennf, you have said that firing an event whenever the reason changes is unappealing. I'd still like to understand why; understanding would help drive a better presentation in said interim.

youennf · 2023-11-14T08:56:38Z

@guidou and I seem to agree on focusing on the following items (sorted by priority):

Expose something like requestUnmute as this is required to solve double-mute.
Evaluate the need for requestMute.
Evaluate the need for a mute reason.

Getting back to requestUnmute, here are some possible API shapes (all promise based):

track.requestUnmute() // all tracks of the related device
navigator.mediaDevices.requestUnmute(track) // all tracks of the related device
inputDeviceInfo.requestUnmute() // does not work for screen capture so probably not ok.
navigator.mediaDevices.requestUnmute(captureKind) // all devices of the given kind.

I would tend to unmute at the device level.

jan-ivar · 2023-11-15T02:20:29Z

The OP assumes users can always unmute. If there's a second actor controlling mute that the user agent cannot affect, then double-mute likely remains the best way to handle that piece. Otherwise we get:

A. Unmuted B. Muted (actionable) C. Muted (unactionable)

Today's apps show C as A.¹ To maintain this, they'd need to distinguish "actionable" from "unactionable" mute.

I'd support those two MuteReasons, but would avoid exposing more cross-origin correlatable information than that. I don't mind re-firing mute when reason changes.

Regarding method shape, I think track.unmute() is all it takes, because

every method that can fail with NotAllowedError is a request, and
muted is already a non-locally-configurable property of the track, so umute() would be a non-locally-contained action, a signal to the UA, which in turn ultimately controls the scope of that action (though it might be useful to define a minimally affected scope). How many documents a UA ultimately enforces mute or unmute upon seems implementation-defined.

_{1. A case could be made for showing C as B, but then nothing happens when the user clicks on it, which seems undesirable. This is an app decision of course.}

eladalon1983 · 2023-11-15T10:56:40Z

@guidou and I seem to agree on focusing on the following items (sorted by priority):

We would have to ask Guido to see if you two agree, but I personally disagree with your prioritization, @youennf.

It's necessary for Web applications like Meet to know when the input devices is muted through the upstream (browser or OS), or else the app can't update its UX to show the input device is muted, which means the user won't understand what's wrong and won't press any app-based unmute button, which then means the app won't even call requestUnmute() - whatever its shape.

The very first step is for the application to know the upstream is muted. That's the top priority.

eladalon1983 · 2023-11-15T11:02:52Z

Above - an illustration. Without a MuteCause or a similar API, how should the Web app even know that the mic button should be changed to the muted-state, and that the onclick handler should request-unmute?

Top priority, imho.

youennf · 2023-11-23T16:11:23Z

I agree that the media session action handler should not need to go to MediaStreamTrack to do its processing.
It seems it is missing something that would make it functional, something like:

partial MediaSessionActionDetails dictionary {
  bool muting;
}

In that case, it seems better to actually design the action handler to execute first, and the mute events to fire second.
This seems consistent with how the spec is designed in general.

Also, maybe we should deprecate setMicrophoneState and setCameraState.

How would this be used in systems with multiple microphones or cameras.

I am not sure this is needed, but the toggle action scope could be placed in MediaSessionActionDetails should the need arise. The scope would be the device, which is the lowest level we should probably go.

eladalon1983 · 2023-11-23T16:24:44Z

I agree that the media session action handler should not need to go to MediaStreamTrack to do its processing. It seems it is missing something that would make it functional, something like:
partial MediaSessionActionDetails dictionary {
  bool muting;
}
In that case, it seems better to actually design the action handler to execute first, and the mute events to fire second. This seems consistent with how the spec is designed in general.

The interaction between the callback and the mute handler is complex and error-prone when multiple mute/unmute actions happen in short succession. A reasonable event listener to mute should expect to just be able to read the most recent state without much worry.
It is unclear how to handle multiple peripherals.
What about non-mic, non-camera tracks like screen-sharing? Note that CrOS already shows UX that allows stopping those; in the future, we might allow pausing (muting).

I think the following proposal is better:

interface MuteReason {
  readonly boolean upstream;
};

partial interace MediaStreamTrack {
  sequence<MuteReason> getMuteReasons();
};

This is simple, it solves the problem, it is immediately available when the mute event fires, and it's fully extensible in the future. For example, we could in the future extend MuteReason as:

enum MuteSource {"unspecified", "user-agent", "operating-system", "hardware"};

interface MuteReason {
  readonly boolean upstream;
  readonly MuteSource source;
};

guidou · 2023-11-23T16:50:45Z

I agree that the media session action handler should not need to go to MediaStreamTrack to do its processing. It seems it is missing something that would make it functional, something like:
partial MediaSessionActionDetails dictionary {
  bool muting;
}
How would this be used in systems with multiple microphones or cameras.

I am not sure this is needed, but the toggle action scope could be placed in MediaSessionActionDetails should the need arise. The scope would be the device, which is the lowest level we should probably go.

It is essential to know which devices are muted and which ones aren't. Multiple cameras and/or microphones is a very common case. The user-choice/unspecified values (or whaterver name/form we choose) exposed as mute reasons on MediaStreamTrack look a lot simpler to me and are a straightforward complement to the MST muted attribute.

Media session looks like it is currently a poor fit that needs a lot of changes to a different spec to support our use case.
The only argument for it is that it has events called togglemicrophone and togglecamera, which do not add any significant value over what we have in MST, since it already has mute and unmute events we can use. The real issue is the new state.

With media session:

Is there a way to know the initial state if there aren't any toggle events? This is essential to set the correct state in the app UI.
Is there a way to know the state per device? It would be useless to know that some device is muted if the app cannot tell it is the device currently used by the user.

eladalon1983 · 2023-11-24T10:11:18Z

During the editors' meeting, Youenn suggested extending togglemicrophone to receive the the mute-state, and possibly make other extensions to address other issues. In that case, the answers to Guido's questions would be "yes":

Is there a way to know the initial state if there aren't any toggle events?
Is there a way to know the state per device?

I think it would still be unhelpful to go down that rabbit hole. The show-stoppers are:

Microphone and camera are not the only things that can be muted; screen-share is also a concern. Adding screensharetoggle and other xtoggle/ytoggle/ztoggle would not scale. We don't need a separate API surface for each thing that can be muted.
Reasonable Web applications should be able to listen to the mute event, read the new state and take action then. This requires an API surface that's updated in conjunction with that event - the Media Session handlers don't fulfil that requirement.

eladalon1983 · 2023-11-24T13:17:20Z

I've published a skeleton of a PR in w3c/mediacapture-main#979 - PTAL. If you think togglemicrophone is a preferable approach, it would be helpful to see a PR to that effect so we could contrast the two.

alvestrand · 2023-11-27T09:13:19Z

When discussing muting, we should also reflect on the (long) discussion on VideoTrackGenerator.mute - w3c/mediacapture-transform#81

youennf · 2023-11-27T09:58:36Z

I see benefits in MediaSession approach. It is an API used for updating UI/visible application states which is exactly what we are talking about here. It also seems easier to do these updates in a single callback, compared to requiring the web app to potentially coalesce itself multiple mute events.

There are things that should be improved in MediaSession, independently on whether we expose muted reason or not.
I filed some MediaSession issues for that purpose. It makes sense to coordinate with the Media WG on this particular issue.

With regards to the definition of mute reasons, it makes sense to me to piggy-back on MediaSession.
In that sense, it seems ok to me to expose to JS that MediaStreamTrack mute/unmute events are triggered by a MediaSession toggle action.

eladalon1983 · 2023-11-27T12:38:43Z

To help the discussion culminate in a decision, comparing PRs would be helpful. I have produced a PR for the approach I suggested. @youennf, could you produce a PR for yours?

youennf · 2023-12-06T12:32:24Z

Here is a media session based proposal:

For the simple use case (one camera, one microphone), nothing is needed, just use the existing mediaSession API
To ease developer's life, introduce:

partial dictionary MediaSessionActionDetails {
    boolean isMuting;
}

To support multiple capture, introduce

partial dictionary MediaSessionActionDetails {
    sequence<DOMString> deviceIds;
}

To support screen capture, introduce a togglescreenshare media session action.

These seem like valid improvements to the existing MediaSession API, independently of whether we expose a boolean on MediaStreamTrack to help disambiguating muted. Or maybe we should think of removing togglemicrophone/togglecamera, if we think onmute/onunmute is superior.

It would help to get the sense of MediaSession people, @steimelchrome, @jan-ivar, thoughts?

I think it is worth preparing slides for both versions, doing PRs now seems premature.
The main thing is to look at it from a web dev convenience point of view.
In particular, since the point is to update UI, is it more convenient to use tracks or to use media session API:

single callback vs. multiple events.
Track can be transferred, media session cannot.
Where to unmute, track or media session?

jan-ivar · 2023-12-06T21:52:10Z

Here is a media session based proposal:

For the simple use case (one camera, one microphone), nothing is needed, just use the existing mediaSession API

I like this proposal. I don't see a need to add more information since this seems to be exactly what the mediaSession API was built for (whether the toggles are in a desktop browser UX or on a phone lock screen seems irrelevant).

Initial state seems solved by firing the mediaSession events early, e.g. on pageload.

This issue is "Solve user agent camera/microphone double-mute", putting other sources out of scope.

Multiple devices also seems out of scope since none of the global UA toggles so far (Safari or Firefox) work per-device AFAIK. They're page or browser global toggles, extending controls present in video conference pages today into the browser, imbuing them with ease of access and some privacy assurance that the webpage cannot hear them, solving the simple use cases of users not being heard, or worrying they can be heard (by participants or webpage). I.e. they affect all devices that page has.

I think Chrome's mute behavior is a bug. I've filed w3c/mediacapture-main#982 to clarify the spec, so let's discuss that there.

I think we should standardize requesting unmute.
I don't think we should standardize requesting mute.
PRs ahead of decisions should not be required.

Too much in this thread.

guidou · 2023-12-06T23:44:25Z

Here is a media session based proposal:
For the simple use case (one camera, one microphone), nothing is needed, just use the existing mediaSession API

We need to solve all use cases that arise in practice, not just the simplest one.

I like this proposal. I don't see a need to add more information since this seems to be exactly what the mediaSession API was built for (whether the toggles are in a desktop browser UX or on a phone lock screen seems irrelevant).

Initial state seems solved by firing the mediaSession events early, e.g. on pageload.

This issue is "Solve user agent camera/microphone double-mute", putting other sources out of scope.

We need to solve all use cases that arise in practice, not just the ones indicated in the first message of this thread.

Multiple devices also seems out of scope since none of the global UA toggles so far (Safari or Firefox) work per-device
AFAIK. They're page or browser global toggles, extending controls present in video conference pages today into the browser, imbuing them with ease of access and some privacy assurance that the webpage cannot hear them, solving the simple use cases of users not being heard, or worrying they can be heard (by participants or webpage). I.e. they affect all devices that page has.

Browser toggles are just one use case that needs to be handled. OS toggles (which can be per device, as in ChromeOS and maybe other OSes) need to be handled too. Hardware toggles need to be considered as well. Just because these were not mentioned in the original message doesn't really mean they're out of scope.

I think Chrome's mute behavior is a bug. I've filed w3c/mediacapture-main#982 to clarify the spec, so let's discuss that there.

It's not a bug, based on the current language of the spec. If the problem is that the mute attribute was defined wrongly, a better way to proceed would be to eliminate mute and its associated events from the spec and replace them with new ones with a new definition that matches the behavior we want today. This would allow us to introduce the new behavior without breaking existing applications and, once applications migrate, we can deprecate and remove the old attribute from implementations. We have done this successfully several times. The experience in Chromium with changing behavior to match spec redefinitions is much worse.

I think we should standardize requesting unmute. I don't think we should standardize requesting mute.

I agree. Apps already implement a way to mute at the app level.

PRs ahead of decisions should not be required.

Slides that show how the proposal solves the problems should be enough. We have a slot in the December 12 meeting to continue discussing this. If you have some slides available, maybe we can look at them then.

eladalon1983 · 2023-12-11T13:32:26Z

I don't think we should standardize requesting mute.

Was this suggested at some point?

PRs ahead of decisions should not be required.

PRs reveal the complexity that otherwise hides behind such phrases as "we could just..."

jan-ivar · 2023-12-11T20:05:33Z

I don't think we should standardize requesting mute.

Was this suggested at some point?

Yes in #39 (comment).

We need to solve all use cases that arise in practice, not just the ones indicated in the first message of this thread.

This issue has 70 comments. Triaging discussion out to other (new or existing) issues such as w3c/mediacapture-main#982 or w3c/mediasession#279 seems worthwhile to me, or I don't see how we're going to reach any kind of consensus on all these feature requests. "Mute reason" probably deserves its own issue as well (there were 14 comments here when it was introduced to this conversation in #39 (comment)). It seems largely orthogonal to the OP proposal of letting apps unmute.

Browser toggles are just one use case that needs to be handled. OS toggles (which can be per device, as in ChromeOS and maybe other OSes) need to be handled too. Hardware toggles need to be considered as well.

These are all User Agent toggles IMHO, the details of which W3C specs tend to leave to the User Agent, focusing instead on the surface between web app and UA. I think that's the level of abstraction we need to be at.

eladalon1983 · 2023-12-11T20:20:24Z

I don't think we should standardize requesting mute.

Was this suggested at some point?

Yes in #39 (comment).

Thanks for clarifying.
I share your opinion (@jan-ivar) about this proposal.

Mute reason" probably deserves its own issue [...] It seems largely orthogonal to the OP proposal of letting apps unmute.

Not completely orthogonal, because requestUnmute() requires some knowledge of the mute-reason, or else an app would be soliciting a useless user gesture from the user, to their disappointment and frustration.

These are all User Agent toggles IMHO, the details of which W3C specs tend to leave to the User Agent, focusing instead on the surface between web app and UA. I think that's the level of abstraction we need to be at.

As a representative of one open source browser who has filed bugs and looked into the code of another open source browser, I hope you'll find this comment compelling. It discusses the value transparency brings to the entire ecosystem.

jan-ivar · 2023-12-11T20:57:29Z

Instead of the OP proposal of a await track.unmute(), we might already have an API in w3c/mediasession#279 (comment):

navigator.mediaSession.setMicrophoneActive(false);

E.g. an app calling this with user attention and transient activation, may be enough of a signal to the UA to unmute tracks it has muted in this document, either raising a toast message about it after the fact, or a prompt ahead of it.

The remaining problem is how the app would learn whether unmuting was successful or not. E.g. might this suffice?

navigator.mediaSession.setMicrophoneActive(false); 
const [unmuted] = await Promise.all([
  new Promise(r => track.onunmute),
  new Promise(r => setTimeout(r, 0))
]);

youennf · 2023-12-11T21:03:35Z

setMicrophoneActive looks good to me if we can validate its actual meaning with the media wg.
This API can be extended (return a promise, additional parameters) to progressively cover more of what has been discussed in this thread.

jan-ivar · 2023-12-11T21:18:27Z

Not completely orthogonal, because requestUnmute() requires some knowledge of the mute-reason, or else an app would be soliciting a useless user gesture from the user, to their disappointment and frustration.

Hiding an unmute control seems a small dent in the disappointment and frustration of being unable to unmute. IOW a secondary problem to the first.

eladalon1983 · 2023-12-11T21:20:49Z

either raising a toast message or a prompt ahead of it.

As I have mentioned multiple times before - the user agent has no idea what "shiny button text" means to the user, or what the user believed they were approving when they conferred transient activation on the page. Only the prompt-based approach is viable.

Hiding an unmute control seems a small dent in the disappointment and frustration of being unable to unmute.

It does not look at all "small" to me. In fact, I am shocked that after months of debating whether an API should be sync or async, which would have no user-visible effect, you label this major user-visible issue as "small." What is the methodology you employ to classify the gravity of issues?

eladalon1983 · 2023-12-11T21:25:47Z

Hiding an unmute control seems a small dent in the disappointment and frustration of being unable to unmute.

I repeat - there is nothing "small" about a user clicking a button and it disappearing without having an effect. It looks like a bug and it would nudge users towards abandoning the Web app in favor of a native-app competitor. Web developers care much more about their users' perception of the app's reliability, than they do about the inconvenience of adding "await" to a method invocation. Let's focus our attention where it matters!

eladalon1983 · 2023-12-11T21:56:22Z

Thank you for this engagement, Jan-Ivar. I am looking forward to hear why you disagree.

Orthogonally, I'll be proposing that the rules of conduct in the WG be amended to discourage the use of the thumbs-down emoji without elaboration. Noting disagreement without elaborating on the reasons serves no productive purpose.

dontcallmedom-bot · 2023-12-13T07:49:42Z

This issue was discussed in WebRTC December 12 2023 meeting – 12 December 2023 (Solve user agent camera/microphone double-mute (mediacapture-extensions))

jan-ivar · 2024-09-17T18:26:50Z

Closing this as the double-mute problem was instead solved in w3c/mediasession#312.

Here's an example of how a website can synchronize application mute state with that of the browser.

jan-ivar mentioned this issue May 26, 2022

Privacy issue: is it a good idea to let webapps lie about camera/mic ON/OFF on a user's lock screen? w3c/mediasession#279

Closed

jan-ivar mentioned this issue Feb 14, 2023

Detecting user-actionable camera issues (e.g., camera shutters) #140

Open

jan-ivar mentioned this issue Feb 22, 2023

Auto-pause capture when user switches captured content w3c/mediacapture-screen-share-extensions#4

Closed

jan-ivar mentioned this issue Mar 16, 2023

What is the granularity of a source? w3c/mediacapture-main#940

Open

eladalon1983 mentioned this issue Nov 24, 2023

Define getMuteReasons() w3c/mediacapture-main#979

Closed

This was referenced Nov 27, 2023

Should there be a toggle screenshare action? w3c/mediasession#306

Closed

What is the relationship between toggle microphone/camera actions and MediaStreamTrack mute/unmute events? w3c/mediasession#307

Closed

eladalon1983 mentioned this issue Nov 30, 2023

Detect whether an element is eligible for capture screen-share/element-capture#31

Closed

jan-ivar mentioned this issue Dec 6, 2023

Avoid circular definition of muted. w3c/mediacapture-main#982

Open

jan-ivar mentioned this issue Feb 24, 2024

Peer Connection and back/forward cache w3c/webrtc-extensions#200

Open

jan-ivar closed this as completed Sep 17, 2024

jan-ivar mentioned this issue Oct 2, 2024

Page Embedded Permission Control mozilla/standards-positions#908

Open

Solve user agent camera/microphone double-mute #39

Solve user agent camera/microphone double-mute #39

Comments

jan-ivar commented Oct 27, 2021 • edited Loading

This issue is only about (1) the double-mute problem.

eladalon1983 commented Oct 29, 2021

jan-ivar commented Oct 30, 2021 • edited Loading

youennf commented Sep 12, 2022

jan-ivar commented Apr 18, 2023 • edited Loading

jan-ivar commented Apr 18, 2023

jan-ivar commented Apr 18, 2023

dontcallmedom-bot commented Apr 19, 2023

youennf commented Apr 20, 2023

youennf commented Apr 20, 2023 • edited Loading

jan-ivar commented Apr 21, 2023

jan-ivar commented Apr 21, 2023

jan-ivar commented May 3, 2023

guidou commented Nov 10, 2023 • edited Loading

eladalon1983 commented Nov 10, 2023

1. Listen

2. Control

youennf commented Nov 10, 2023

eladalon1983 commented Nov 10, 2023

youennf commented Nov 10, 2023

guidou commented Nov 10, 2023

eladalon1983 commented Nov 13, 2023

youennf commented Nov 13, 2023

guidou commented Nov 13, 2023

eladalon1983 commented Nov 13, 2023

youennf commented Nov 14, 2023

jan-ivar commented Nov 15, 2023

eladalon1983 commented Nov 15, 2023 • edited Loading

eladalon1983 commented Nov 15, 2023 • edited Loading

youennf commented Nov 23, 2023

eladalon1983 commented Nov 23, 2023 • edited Loading

guidou commented Nov 23, 2023 • edited Loading

eladalon1983 commented Nov 24, 2023 • edited Loading

eladalon1983 commented Nov 24, 2023

alvestrand commented Nov 27, 2023

youennf commented Nov 27, 2023

eladalon1983 commented Nov 27, 2023

youennf commented Dec 6, 2023

jan-ivar commented Dec 6, 2023

guidou commented Dec 6, 2023

eladalon1983 commented Dec 11, 2023

jan-ivar commented Dec 11, 2023

eladalon1983 commented Dec 11, 2023

jan-ivar commented Dec 11, 2023

youennf commented Dec 11, 2023

jan-ivar commented Dec 11, 2023

eladalon1983 commented Dec 11, 2023 • edited Loading

eladalon1983 commented Dec 11, 2023 • edited Loading

eladalon1983 commented Dec 11, 2023

dontcallmedom-bot commented Dec 13, 2023

jan-ivar commented Sep 17, 2024

jan-ivar commented Oct 27, 2021 •

edited

Loading

jan-ivar commented Oct 30, 2021 •

edited

Loading

jan-ivar commented Apr 18, 2023 •

edited

Loading

youennf commented Apr 20, 2023 •

edited

Loading

guidou commented Nov 10, 2023 •

edited

Loading

eladalon1983 commented Nov 15, 2023 •

edited

Loading

eladalon1983 commented Nov 15, 2023 •

edited

Loading

eladalon1983 commented Nov 23, 2023 •

edited

Loading

guidou commented Nov 23, 2023 •

edited

Loading

eladalon1983 commented Nov 24, 2023 •

edited

Loading

eladalon1983 commented Dec 11, 2023 •

edited

Loading

eladalon1983 commented Dec 11, 2023 •

edited

Loading