Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve user agent camera/microphone double-mute #39

Closed
jan-ivar opened this issue Oct 27, 2021 · 79 comments
Closed

Solve user agent camera/microphone double-mute #39

jan-ivar opened this issue Oct 27, 2021 · 79 comments

Comments

@jan-ivar
Copy link
Member

jan-ivar commented Oct 27, 2021

User agent mute-toggles for camera & mic can be useful, yielding enhanced privacy (no need to trust site), and quick access (a sneeze coming on, or a family member walking into frame?)

  • Safari has pause/resume in its URL bar
  • Firefox has global cam/mic mute toggles behind a pref (set privacy.webrtc.globalMuteToggles in about:config)
  • Chrome has recently opened an issue discussing it.

It's behind a pref in Firefox because:

  1. The double-mute problem: site-mute + ua-mute = 4 states, where 3 produce no sound ("Can you hear me now?")
  2. UA-mute of microphone interferes with "Are you talking?" features
  3. Some sites (Meet) stop camera to work around crbug 642785 in Chrome, so there's no video track to UA-mute

This image is titled: "Am I muted?"

This issue is only about (1) the double-mute problem.

We determined we can only solve the double-mute problem by involving the site, which requires standardization.

The idea is:

  1. If the UA mutes or unmutes, the site should update its button to match.
  2. If the user unmutes using the site's button, the UA should unmute(!)

The first point requires no spec change: sites can listen to the mute and unmute events on the track (but they don't).

The second point is key: if the user sees the site's button turn to "muted", they'll expect to be able to click it to unmute.

This is where it gets tricky, because we don't want to allow sites to unmute themselves at will, as this defeats any privacy benefits.

The proposal here is:

partial interface MediaStreamTrack {
  undefined unmute();
}

It would throw InvalidStateError unless it has transient activation, is fully active, and has focus. User agents may also throw NotAllowedError for any reason, but if they don't then they must unmute the track (which will fire the unmute event).

This should let user agents that wish to develop UX without the double-mute problem.

@eladalon1983
Copy link
Member

Repeating (and rephrasing) myself from Chromium bug since it'd be unreasonable to expect the audiences to be identical:

Media-controls exposed by the browser, which allows an ongoing mic/camera/screen-capture to be muted by the user, communicates an implicit promise from the browser to the user. If the application is allowed to override that promise, it's allowed to break that promise.

I understand the double-mute problem and hope that there could be other ways to resolve it. For example, maybe if the application tries to unmute the track, the browser could show the user a prompt to approve that. This would mean that a user can still click the red mic button to unmute, but because additional approval is required, the application cannot unmute unilaterally in response to an arbitrary user gesture.

@jan-ivar
Copy link
Member Author

jan-ivar commented Oct 30, 2021

That's a good idea. Having the method be asynchronous would allow this.

partial interface MediaStreamTrack {
  Promise<undefined> unmute();
}

The goal of a spec here is to allow innovation in this space, without arriving at specific UX. It could be a prompt, or maybe a toast message is enough.

I think a lot of users would be surprised to learn that when they mute microphone or camera on a web site today, they have zero assurances that it actually happens. Well-behaved websites have lulled most users into feeling in control when they're not. Most probably don't consider that the website may turn camera or microphone back on at any time as long as the page is open.

The page doesn't even need to be open anymore: a different (origin) accomplice page can reopen/navigate to it without user interaction at a later point, since gUM doesn't require transient activation.

We dropped the ball on transient activation in gUM. Having better UA-integrated muting with transient activation might give us a second chance to rectify some privacy concerns.

For instance, a UA could choose to mute for the user if it detects a page accessing camera or microphone on pageload or without interaction.

@youennf
Copy link
Contributor

youennf commented Sep 12, 2022

Promise<undefined> unmute();

unmute() method seems fine.
I wonder whether we should not try to introduce a mute() method as well.
This would allow the website to mute itself and have its UI synchronised with OS/UA UI.

Also, in Safari, the mute/unmute is page wide, which means that all microphone tracks are either muted or unmuted.
This does not align particularly well with unmute being at the track level.
Maybe introducing unmute/mute at navigator.mediaDevices level would be good enough?

@jan-ivar
Copy link
Member Author

jan-ivar commented Apr 18, 2023

I wonder whether we should not try to introduce a mute() method as well.

I fear this would create confusion and a false symmetry suggesting muting is under application control when it is not.

This would allow the website to mute itself and have its UI synchronised with OS/UA UI.

Applications have track.enabled for this. UAs "MAY" turn off indicators when this "brings all tracks connected to the device to be ... disabled," which Firefox has done since 2018. This is crbug 642785, I dunno if webkit has one.

Here's a fiddle for toggling multiple tracks demonstrating Firefox updating its camera URL bar indicator and OS/hardware light (modulo bug 1694304 on Windows for the hardware light), whenever the number of enabled tracks goes to zero.

I suggest leaving mute to UAs and concentrating on how apps can signal interest to unmute, to solve the issue at hand.

Also, in Safari, the mute/unmute is page wide,

Do you mean page wide or document wide? What about iframes?

... which means that all microphone tracks are either muted or unmuted. This does not align particularly well with unmute being at the track level.

Maybe we could use constraints? 🙂

Maybe introducing unmute/mute at navigator.mediaDevices level would be good enough?

Maybe, except muted is a property of the track, not mediaDevices.

@jan-ivar
Copy link
Member Author

There's also the issue of multiple cameras. If we end up with navigator.mediaDevices.unmute(track) that would really suck.

@jan-ivar
Copy link
Member Author

A simplifying factor is that UA muting is 100% about privacy, and as soon as one track is unmuted on a page, then there's no more privacy. So per-track mute would serve no purpose, and make for a terrible API:

await Promise.all(applicationKeepsTrackOfAllTracksItIsUsing.map(track => track.unmute())); // ugh

But with that understanding (of UA mute as a privacy feature), it seems POLA for track.unmute() to unmute all tracks of the same source per document or per top-level document.

So I think I agree mute is a property of the source by that definition.

But there can be multiple sources in navigator.mediaDevices.[[mediaStreamTrackSources]], and cameras can be pointed different ways, so it's not inconceivable that a UA may wish to control privacy per camera.

Even if we don't care about that, we'd need navigator.mediaDevices.unmute(kind) which seems unappealing. I'd rather go with track.unmute().

@dontcallmedom-bot
Copy link

This issue had an associated resolution in WebRTC WG 2023-04-18 – (Issue 39: Solve user agent camera/microphone double-mute):

RESOLUTION: No objections.

@youennf
Copy link
Contributor

youennf commented Apr 20, 2023

track.unmute() makes sense if we think this is useful for all source types.
For WebRTC or canvas tracks, this does not make sense.
For screen sources, I am unclear yet whether we will want to mute all tracks (video and audio) or independently, CaptureController is the object representing the source, as such, the mute\unmute functionality could be placed there.

For capture tracks, InputDeviceInfo is what is closest to the source, hence why I was mentioning InputDeviceInfo.unmute as a possibility.
Another difference to look at is that MediaStreamTrack is transferable, not InputDeviceInfo, we should consider this (though InputDeviceInfo can of course be made transferable in the future).

Page/document mute scope is probably covering 90% of the cases at least and is simpler to implement.
But I feel muting at the source level is better in general and UA can always mute all sources if one gets muted.

I fear this would create confusion and a false symmetry suggesting muting is under application control when it is not.

unmute already introduces this potential confusion about who has control, hence the requestMute/requestUnmute name.

Applications have track.enabled for this. UAs "MAY" turn off indicators when this "brings all tracks connected to the device to be ... disabled," which Firefox has done since 2018. This is crbug 642785, I dunno if webkit has one.

This is a MAY though. Setting all tracks of the same source with enabled = false does not mean track.muted will switch to true, this is left to UA which does not seem great for interop. In Firefox mode, I would guess that muted would be set to true later on, when the application sets enabled = true on one of the track, in which case the application will then have to call unmute. This is not simple.

Looking at Safari, let's say that Safari would update its muted icon when all tracks are enabled = false.
It would then need to immediately set muted = true to these tracks.
Let's say user then clicks on Safari UI, all tracks will have muted = false, but it is then up to the application to register the muted event and do as if the user clicked on one of its own icon. Not simple again.

Looking at Web Applications, they tend to have clone tracks in window environment (local rendering and PC, potential different sizes as well), in the future in workers as well (for encoding/networking) or other windows (via transfer). Having to set enabled to false for each one of these objects, including transferred tracks, is cumbersome and potentially error prone.

Looking at OS support, kAUVoiceIOProperty_MutedSpeechActivityEventListener and kAUVoiceIOProperty_MuteOutput are potentially useful to implement the "Are you talking UI" in the screenshot you added above.
It seems a worthwhile API addition we could consider in the future: if mic is muted, we will allow you to be notified that user might be speaking.

Overall, it seems cleaner to me to have two separate APIs:

  • enabled which is about stopping to use the information provided by the source and is internal to the web application
  • muted which is about stopping to receive the information and is tied to UA UI.

@youennf
Copy link
Contributor

youennf commented Apr 20, 2023

Alternative to InputDeviceInfo is navigator.mediaDevices.requestUnmute(deviceId)

@jan-ivar
Copy link
Member Author

track.unmute() makes sense if we think this is useful for all source types.

It was a WG design choice to reuse MST for other sources to avoid inheritance. The cost of that is living with the fact that not all sources have all track abilities, NOT that tracks only have abilities shared by all sources.

getUserMedia returns camera and microphone tracks, so adding attributes, methods and constraints specific to camera and microphone should be fine. If it's not, then time to split inheritance.

E.g. track.muted only makes sense for camera and microphone, and track.unmute() fits with that.

Other sources do not have the double-mute problem, so to not complicate discussion, let's not discuss them here.

This is a MAY though.

Here's a SHOULD: "When a "live", unmuted, and enabled track sourced by a device exposed by getUserMedia() becomes either muted or disabled, and this brings all tracks connected to the device (across all navigables the user agent operates) to be either muted, disabled, or stopped, then the UA SHOULD relinquish the device within 3 seconds..."

Setting all tracks of the same source with enabled = false does not mean track.muted will switch to true, this is left to UA which does not seem great for interop.

Why would a UA mute an in-view application just because it disabled its tracks? That would be terrible for web compat.

To clarify, Mozilla needs no spec changes to solve turning off privacy indicators or camera light. Our view is the path to interop there is changing the MAY and SHOULD to MUST. But please let's discuss that in a separate issue.

In Firefox mode, I would guess that muted would be set to true later on, when the application sets enabled = true on one of the track, in which case the application will then have to call unmute. This is not simple.

No that is not our plan. As explained in the OP we have a privacy.webrtc.globalMuteToggles pref in about:config which turns on the global user-facing mute controls shown. image, and we want to arm sites with tools to unmute themselves better to prepare for UA features like this.

Sorry for any misunderstanding, but it's not my intent to standardize UA muting here, only application-induced unmuting. Muting remains up to user agents, and I think it is important for privacy that they be allowed to continue to own that problem.

The scope of the proposal in the OP (and this issue) was to arm applications with tools to unmute themselves IF the user agent mutes them, not define when user agents mute them.

@jan-ivar
Copy link
Member Author

unmute already introduces this potential confusion about who has control, hence the requestMute/requestUnmute name.

We have getUserMedia, not requestUserMedia, and that doesn't seem to confuse anybody.

NotAllowedError seems clear about who has control.

@jan-ivar
Copy link
Member Author

jan-ivar commented May 3, 2023

Here's a SHOULD: "When a "live", unmuted, and enabled track sourced by a device exposed by getUserMedia() becomes either muted or disabled, and this brings all tracks connected to the device (across all navigables the user agent operates) to be either muted, disabled, or stopped, then the UA SHOULD relinquish the device within 3 seconds..."

FYI this was recently fixed upstream in https://webrtc-review.googlesource.com/c/src/+/302200

@guidou
Copy link

guidou commented Nov 10, 2023

I'd like to revive this discussion, since these types of system-level controls (either at the UA or the OS) are becoming more common and we have observed that they create a lot of confusion for users.
Like @jan-ivar says, there are two issues here:

If the UA mutes or unmutes, the site should update its button to match.
If the user unmutes using the site's button, the UA should unmute(!)

However, I disagree with this statement:

The first point requires no spec change: sites can listen to the mute and unmute events on the track (but they don't)

The reason sites don't listen to the mute and unmute event is that mute and unmute can be triggered by other causes and if the application cannot know that those events (and themuted attribute) are caused by the double-mute problem, it cannot react appropriately. The spec says muted means live samples are not made available to the MediaStreamTrack, which is not specific to UA/OS-level mute controls. In Chrome specifically, muted means a track is not getting frames for any reason (and system-level mute has never been one of those reasons in practice). IIRC, Safari has similarities with Chrome in this regard.

This has become a major problem for VC applications and I think we need to solve it properly.
I think we can iterate on several of the proposals made in this thread, which look like very promising IMO.

cc @eladalon1983

@eladalon1983
Copy link
Member

The problem as I see it is that users can mute through multiple sources - app, UA, OS and hardware. The propagation of state through these layers is presently incomplete - an opportunity for us to earn our keep.

In the high-level, I think we have to provide two mechanisms:

  1. Sites listen for mute-status changes from upstream sources (UA, OS and hardware, in that order).
  2. Sites control mute-status in upstream sources.

1. Listen

The principle here should be fairly uncontroversial.
For the concrete mechanism, I agree with Guido that mute events are not currently well-suited. Either of the following would work for me:

  1. Bespoke events.
  2. Revive the idea of a MuteCause/MuteReason, so that the same mechanism would service both this issue as well as similar ones (see link).

I prefer no2. To start the ball rolling on a concerete proposal:

enum MuteCause {
  "unspecified",  // Catch-all default.
  "operating-system-choice",
  "user-agent-choice",
  // Extensible to hardware, and to -issue if it's not a -choice.
};

interface MuteEvent : Event {
  /* Exercise for the reader */
};

partial interface MediaStreamTrack {
  // Note that multiple causes might apply concurrently.
  readonly attribute sequence<MuteCause> causes;
};

2. Control

There's some potential for controversy here, but I think we can resolve it.

Jan-Ivar proposed:

If the user unmutes using the site's button, the UA should unmute(!)

While I'm sure VC apps would be delighted to have such control, I am afraid that no security/privacy department in any user agent would ever approve it (unless we add some UA-prompt; foreshadowing). Jan-Ivar suggested transient activation and focus as necessary gating mechanisms. These are fine requirements, but they are not sufficient as any transient activation would look identical to the user agent here, possibly subverting the user's actual intentions if they clicked on a mislabelled button. I'd suggest requiring also a PEPC-like prompt. Reasonable app code would then look something like this:

unmuteButton.addEventListener('click', "unmuteClicked();");

async function unmuteClicked() {
  // If necessary, prompt the user to unmute at UA-level etc.
  if (upstreamUnmute) {
    try {
      await track.unmute();
    } catch (error) {
      return;
    }
  }

  // Proceed with the "normal" unmuting in the app.
  // * Resume remote transmission of media.
  // * Change UX to reflect that clicking the button now means "mute".
  // * Update internal state.
}

@youennf
Copy link
Contributor

youennf commented Nov 10, 2023

I am not sure how much we need a mute reason. Distinct requestUnmute failures might be sufficient.

@eladalon1983
Copy link
Member

  1. You need a MuteReason because mute can happen for inactionable reasons too, like the source not having any new frames to deliver.
  2. Applications might not wish requestUnmute() themselves, but rather provide some reminder/hint to the user about where they muted from (either browser or operating system), and leave it to the user to unmute if they wish.

@youennf
Copy link
Contributor

youennf commented Nov 10, 2023

The mute reasons may vary within the muted period and firing mute events when only the reason is changing is not really appealing.
The flow of user trying to unmute and app providing the hint based on the failure seems to cover most cases (and it does not preclude adding reasons in the future if we discover this is useful).

@guidou
Copy link

guidou commented Nov 10, 2023

The mute reasons may vary within the muted period and firing mute events when only the reason is changing is not really appealing.

I'm OK with the mute event only firing when the muted attribute changes (not the muted reason attribute). WDYT?
The main point iis that the muted attribute in its current form is not enough to solve this problem.
Having a reason looks like a good way to make the muted attribute useful to solve this problem. Otherwise, we need a new attribute or a different API surface.

The flow of user trying to unmute and app providing the hint based on the failure seems to cover most cases (and it does not preclude adding reasons in the future if we discover this is useful).

Does this mean you support the approach of having an attribute for the mute cause?

@eladalon1983
Copy link
Member

The mute reasons may vary within the muted period and firing mute events when only the reason is changing is not really appealing.

Why is it not appealing to fire a mute event when the set of reasons changes? (Note that we have a separate unmute event already, btw.)

@youennf
Copy link
Contributor

youennf commented Nov 13, 2023

Does this mean you support the approach of having an attribute for the mute cause?

I see this as a potential improvement while I see an API to request unmute as a blocker.
I would focus on the unblocking API.

I also think that an API to request capture to be muted would be useful. The current approach (use enabled=false on all tracks of a device), is a bit tedious and might face backward compatibility issues.

@guidou
Copy link

guidou commented Nov 13, 2023

I see this as a potential improvement while I see an API to request unmute as a blocker.
I would focus on the unblocking API.

I agree. Let's focus on that first.

I also think that an API to request capture to be muted would be useful. The current approach (use enabled=false on all tracks of a device), is a bit tedious and might face backward compatibility issues.

Also agree. This requires some more thinking because system-level muting is not necessarily equivalent to muting a source or a set of tracks.

@eladalon1983
Copy link
Member

I'm having some trouble parsing the last few messages on this thread. If we're all in agreement that we want to add an API exposing the OS-mute-state, then I'll gladly present something in the next available interim opportunity. Specifically, I'd like to present my proposal here. Before I do - @youennf, you have said that firing an event whenever the reason changes is unappealing. I'd still like to understand why; understanding would help drive a better presentation in said interim.

@youennf
Copy link
Contributor

youennf commented Nov 14, 2023

@guidou and I seem to agree on focusing on the following items (sorted by priority):

  • Expose something like requestUnmute as this is required to solve double-mute.
  • Evaluate the need for requestMute.
  • Evaluate the need for a mute reason.

Getting back to requestUnmute, here are some possible API shapes (all promise based):

  • track.requestUnmute() // all tracks of the related device
  • navigator.mediaDevices.requestUnmute(track) // all tracks of the related device
  • inputDeviceInfo.requestUnmute() // does not work for screen capture so probably not ok.
  • navigator.mediaDevices.requestUnmute(captureKind) // all devices of the given kind.

I would tend to unmute at the device level.

@jan-ivar
Copy link
Member Author

The OP assumes users can always unmute. If there's a second actor controlling mute that the user agent cannot affect, then double-mute likely remains the best way to handle that piece. Otherwise we get:

image

A. Unmuted B. Muted (actionable) C. Muted (unactionable)

Today's apps show C as A.¹ To maintain this, they'd need to distinguish "actionable" from "unactionable" mute.

I'd support those two MuteReasons, but would avoid exposing more cross-origin correlatable information than that. I don't mind re-firing mute when reason changes.

Regarding method shape, I think track.unmute() is all it takes, because

  1. every method that can fail with NotAllowedError is a request, and
  2. muted is already a non-locally-configurable property of the track, so umute() would be a non-locally-contained action, a signal to the UA, which in turn ultimately controls the scope of that action (though it might be useful to define a minimally affected scope). How many documents a UA ultimately enforces mute or unmute upon seems implementation-defined.

1. A case could be made for showing C as B, but then nothing happens when the user clicks on it, which seems undesirable. This is an app decision of course.

@eladalon1983
Copy link
Member

eladalon1983 commented Nov 15, 2023

@guidou and I seem to agree on focusing on the following items (sorted by priority):

We would have to ask Guido to see if you two agree, but I personally disagree with your prioritization, @youennf.

It's necessary for Web applications like Meet to know when the input devices is muted through the upstream (browser or OS), or else the app can't update its UX to show the input device is muted, which means the user won't understand what's wrong and won't press any app-based unmute button, which then means the app won't even call requestUnmute() - whatever its shape.

The very first step is for the application to know the upstream is muted. That's the top priority.

@eladalon1983
Copy link
Member

eladalon1983 commented Nov 15, 2023

image

Above - an illustration. Without a MuteCause or a similar API, how should the Web app even know that the mic button should be changed to the muted-state, and that the onclick handler should request-unmute?

Top priority, imho.

@youennf
Copy link
Contributor

youennf commented Nov 23, 2023

I agree that the media session action handler should not need to go to MediaStreamTrack to do its processing.
It seems it is missing something that would make it functional, something like:

partial MediaSessionActionDetails dictionary {
  bool muting;
}

In that case, it seems better to actually design the action handler to execute first, and the mute events to fire second.
This seems consistent with how the spec is designed in general.

Also, maybe we should deprecate setMicrophoneState and setCameraState.

How would this be used in systems with multiple microphones or cameras.

I am not sure this is needed, but the toggle action scope could be placed in MediaSessionActionDetails should the need arise. The scope would be the device, which is the lowest level we should probably go.

@eladalon1983
Copy link
Member

eladalon1983 commented Nov 23, 2023

I agree that the media session action handler should not need to go to MediaStreamTrack to do its processing. It seems it is missing something that would make it functional, something like:

partial MediaSessionActionDetails dictionary {
  bool muting;
}

In that case, it seems better to actually design the action handler to execute first, and the mute events to fire second. This seems consistent with how the spec is designed in general.

  • The interaction between the callback and the mute handler is complex and error-prone when multiple mute/unmute actions happen in short succession. A reasonable event listener to mute should expect to just be able to read the most recent state without much worry.
  • It is unclear how to handle multiple peripherals.
  • What about non-mic, non-camera tracks like screen-sharing? Note that CrOS already shows UX that allows stopping those; in the future, we might allow pausing (muting).

I think the following proposal is better:

interface MuteReason {
  readonly boolean upstream;
};

partial interace MediaStreamTrack {
  sequence<MuteReason> getMuteReasons();
};

This is simple, it solves the problem, it is immediately available when the mute event fires, and it's fully extensible in the future. For example, we could in the future extend MuteReason as:

enum MuteSource {"unspecified", "user-agent", "operating-system", "hardware"};

interface MuteReason {
  readonly boolean upstream;
  readonly MuteSource source;
};

@guidou
Copy link

guidou commented Nov 23, 2023

I agree that the media session action handler should not need to go to MediaStreamTrack to do its processing. It seems it is missing something that would make it functional, something like:

partial MediaSessionActionDetails dictionary {
  bool muting;
}

How would this be used in systems with multiple microphones or cameras.

I am not sure this is needed, but the toggle action scope could be placed in MediaSessionActionDetails should the need arise. The scope would be the device, which is the lowest level we should probably go.

It is essential to know which devices are muted and which ones aren't. Multiple cameras and/or microphones is a very common case. The user-choice/unspecified values (or whaterver name/form we choose) exposed as mute reasons on MediaStreamTrack look a lot simpler to me and are a straightforward complement to the MST muted attribute.

Media session looks like it is currently a poor fit that needs a lot of changes to a different spec to support our use case.
The only argument for it is that it has events called togglemicrophone and togglecamera, which do not add any significant value over what we have in MST, since it already has mute and unmute events we can use. The real issue is the new state.

With media session:

  • Is there a way to know the initial state if there aren't any toggle events? This is essential to set the correct state in the app UI.
  • Is there a way to know the state per device? It would be useless to know that some device is muted if the app cannot tell it is the device currently used by the user.

@eladalon1983
Copy link
Member

eladalon1983 commented Nov 24, 2023

During the editors' meeting, Youenn suggested extending togglemicrophone to receive the the mute-state, and possibly make other extensions to address other issues. In that case, the answers to Guido's questions would be "yes":

Is there a way to know the initial state if there aren't any toggle events?
Is there a way to know the state per device?

I think it would still be unhelpful to go down that rabbit hole. The show-stoppers are:

  1. Microphone and camera are not the only things that can be muted; screen-share is also a concern. Adding screensharetoggle and other xtoggle/ytoggle/ztoggle would not scale. We don't need a separate API surface for each thing that can be muted.
  2. Reasonable Web applications should be able to listen to the mute event, read the new state and take action then. This requires an API surface that's updated in conjunction with that event - the Media Session handlers don't fulfil that requirement.

@eladalon1983
Copy link
Member

I've published a skeleton of a PR in w3c/mediacapture-main#979 - PTAL. If you think togglemicrophone is a preferable approach, it would be helpful to see a PR to that effect so we could contrast the two.

@alvestrand
Copy link
Contributor

When discussing muting, we should also reflect on the (long) discussion on VideoTrackGenerator.mute - w3c/mediacapture-transform#81

@youennf
Copy link
Contributor

youennf commented Nov 27, 2023

I see benefits in MediaSession approach. It is an API used for updating UI/visible application states which is exactly what we are talking about here. It also seems easier to do these updates in a single callback, compared to requiring the web app to potentially coalesce itself multiple mute events.

There are things that should be improved in MediaSession, independently on whether we expose muted reason or not.
I filed some MediaSession issues for that purpose. It makes sense to coordinate with the Media WG on this particular issue.

With regards to the definition of mute reasons, it makes sense to me to piggy-back on MediaSession.
In that sense, it seems ok to me to expose to JS that MediaStreamTrack mute/unmute events are triggered by a MediaSession toggle action.

@eladalon1983
Copy link
Member

To help the discussion culminate in a decision, comparing PRs would be helpful. I have produced a PR for the approach I suggested. @youennf, could you produce a PR for yours?

@youennf
Copy link
Contributor

youennf commented Dec 6, 2023

Here is a media session based proposal:

  • For the simple use case (one camera, one microphone), nothing is needed, just use the existing mediaSession API
  • To ease developer's life, introduce:
partial dictionary MediaSessionActionDetails {
    boolean isMuting;
}
  • To support multiple capture, introduce
partial dictionary MediaSessionActionDetails {
    sequence<DOMString> deviceIds;
}
  • To support screen capture, introduce a togglescreenshare media session action.

These seem like valid improvements to the existing MediaSession API, independently of whether we expose a boolean on MediaStreamTrack to help disambiguating muted. Or maybe we should think of removing togglemicrophone/togglecamera, if we think onmute/onunmute is superior.

It would help to get the sense of MediaSession people, @steimelchrome, @jan-ivar, thoughts?

I think it is worth preparing slides for both versions, doing PRs now seems premature.
The main thing is to look at it from a web dev convenience point of view.
In particular, since the point is to update UI, is it more convenient to use tracks or to use media session API:

  • single callback vs. multiple events.
  • Track can be transferred, media session cannot.
  • Where to unmute, track or media session?

@jan-ivar
Copy link
Member Author

jan-ivar commented Dec 6, 2023

Here is a media session based proposal:

For the simple use case (one camera, one microphone), nothing is needed, just use the existing mediaSession API

I like this proposal. I don't see a need to add more information since this seems to be exactly what the mediaSession API was built for (whether the toggles are in a desktop browser UX or on a phone lock screen seems irrelevant).

Initial state seems solved by firing the mediaSession events early, e.g. on pageload.

This issue is "Solve user agent camera/microphone double-mute", putting other sources out of scope.

Multiple devices also seems out of scope since none of the global UA toggles so far (Safari or Firefox) work per-device AFAIK. They're page or browser global toggles, extending controls present in video conference pages today into the browser, imbuing them with ease of access and some privacy assurance that the webpage cannot hear them, solving the simple use cases of users not being heard, or worrying they can be heard (by participants or webpage). I.e. they affect all devices that page has.

I think Chrome's mute behavior is a bug. I've filed w3c/mediacapture-main#982 to clarify the spec, so let's discuss that there.

I think we should standardize requesting unmute.
I don't think we should standardize requesting mute.
PRs ahead of decisions should not be required.

Too much in this thread.

@guidou
Copy link

guidou commented Dec 6, 2023

Here is a media session based proposal:
For the simple use case (one camera, one microphone), nothing is needed, just use the existing mediaSession API

We need to solve all use cases that arise in practice, not just the simplest one.

I like this proposal. I don't see a need to add more information since this seems to be exactly what the mediaSession API was built for (whether the toggles are in a desktop browser UX or on a phone lock screen seems irrelevant).

Initial state seems solved by firing the mediaSession events early, e.g. on pageload.

This issue is "Solve user agent camera/microphone double-mute", putting other sources out of scope.

We need to solve all use cases that arise in practice, not just the ones indicated in the first message of this thread.

Multiple devices also seems out of scope since none of the global UA toggles so far (Safari or Firefox) work per-device
AFAIK. They're page or browser global toggles, extending controls present in video conference pages today into the browser, imbuing them with ease of access and some privacy assurance that the webpage cannot hear them, solving the simple use cases of users not being heard, or worrying they can be heard (by participants or webpage). I.e. they affect all devices that page has.

Browser toggles are just one use case that needs to be handled. OS toggles (which can be per device, as in ChromeOS and maybe other OSes) need to be handled too. Hardware toggles need to be considered as well. Just because these were not mentioned in the original message doesn't really mean they're out of scope.

I think Chrome's mute behavior is a bug. I've filed w3c/mediacapture-main#982 to clarify the spec, so let's discuss that there.

It's not a bug, based on the current language of the spec. If the problem is that the mute attribute was defined wrongly, a better way to proceed would be to eliminate mute and its associated events from the spec and replace them with new ones with a new definition that matches the behavior we want today. This would allow us to introduce the new behavior without breaking existing applications and, once applications migrate, we can deprecate and remove the old attribute from implementations. We have done this successfully several times. The experience in Chromium with changing behavior to match spec redefinitions is much worse.

I think we should standardize requesting unmute. I don't think we should standardize requesting mute.

I agree. Apps already implement a way to mute at the app level.

PRs ahead of decisions should not be required.

Slides that show how the proposal solves the problems should be enough. We have a slot in the December 12 meeting to continue discussing this. If you have some slides available, maybe we can look at them then.

@eladalon1983
Copy link
Member

I don't think we should standardize requesting mute.

Was this suggested at some point?

PRs ahead of decisions should not be required.

PRs reveal the complexity that otherwise hides behind such phrases as "we could just..."

@jan-ivar
Copy link
Member Author

I don't think we should standardize requesting mute.

Was this suggested at some point?

Yes in #39 (comment).

We need to solve all use cases that arise in practice, not just the ones indicated in the first message of this thread.

This issue has 70 comments. Triaging discussion out to other (new or existing) issues such as w3c/mediacapture-main#982 or w3c/mediasession#279 seems worthwhile to me, or I don't see how we're going to reach any kind of consensus on all these feature requests. "Mute reason" probably deserves its own issue as well (there were 14 comments here when it was introduced to this conversation in #39 (comment)). It seems largely orthogonal to the OP proposal of letting apps unmute.

Browser toggles are just one use case that needs to be handled. OS toggles (which can be per device, as in ChromeOS and maybe other OSes) need to be handled too. Hardware toggles need to be considered as well.

These are all User Agent toggles IMHO, the details of which W3C specs tend to leave to the User Agent, focusing instead on the surface between web app and UA. I think that's the level of abstraction we need to be at.

@eladalon1983
Copy link
Member

I don't think we should standardize requesting mute.

Was this suggested at some point?

Yes in #39 (comment).

Thanks for clarifying.
I share your opinion (@jan-ivar) about this proposal.

Mute reason" probably deserves its own issue [...] It seems largely orthogonal to the OP proposal of letting apps unmute.

Not completely orthogonal, because requestUnmute() requires some knowledge of the mute-reason, or else an app would be soliciting a useless user gesture from the user, to their disappointment and frustration.

These are all User Agent toggles IMHO, the details of which W3C specs tend to leave to the User Agent, focusing instead on the surface between web app and UA. I think that's the level of abstraction we need to be at.

As a representative of one open source browser who has filed bugs and looked into the code of another open source browser, I hope you'll find this comment compelling. It discusses the value transparency brings to the entire ecosystem.

@jan-ivar
Copy link
Member Author

Instead of the OP proposal of a await track.unmute(), we might already have an API in w3c/mediasession#279 (comment):

navigator.mediaSession.setMicrophoneActive(false); 

E.g. an app calling this with user attention and transient activation, may be enough of a signal to the UA to unmute tracks it has muted in this document, either raising a toast message about it after the fact, or a prompt ahead of it.

The remaining problem is how the app would learn whether unmuting was successful or not. E.g. might this suffice?

navigator.mediaSession.setMicrophoneActive(false); 
const [unmuted] = await Promise.all([
  new Promise(r => track.onunmute),
  new Promise(r => setTimeout(r, 0))
]);

@youennf
Copy link
Contributor

youennf commented Dec 11, 2023

setMicrophoneActive looks good to me if we can validate its actual meaning with the media wg.
This API can be extended (return a promise, additional parameters) to progressively cover more of what has been discussed in this thread.

@jan-ivar
Copy link
Member Author

Not completely orthogonal, because requestUnmute() requires some knowledge of the mute-reason, or else an app would be soliciting a useless user gesture from the user, to their disappointment and frustration.

Hiding an unmute control seems a small dent in the disappointment and frustration of being unable to unmute. IOW a secondary problem to the first.

@eladalon1983
Copy link
Member

eladalon1983 commented Dec 11, 2023

either raising a toast message or a prompt ahead of it.

As I have mentioned multiple times before - the user agent has no idea what "shiny button text" means to the user, or what the user believed they were approving when they conferred transient activation on the page. Only the prompt-based approach is viable.

Hiding an unmute control seems a small dent in the disappointment and frustration of being unable to unmute.

It does not look at all "small" to me. In fact, I am shocked that after months of debating whether an API should be sync or async, which would have no user-visible effect, you label this major user-visible issue as "small." What is the methodology you employ to classify the gravity of issues?

@eladalon1983
Copy link
Member

eladalon1983 commented Dec 11, 2023

Hiding an unmute control seems a small dent in the disappointment and frustration of being unable to unmute.

I repeat - there is nothing "small" about a user clicking a button and it disappearing without having an effect. It looks like a bug and it would nudge users towards abandoning the Web app in favor of a native-app competitor. Web developers care much more about their users' perception of the app's reliability, than they do about the inconvenience of adding "await" to a method invocation. Let's focus our attention where it matters!

@eladalon1983
Copy link
Member

image

Thank you for this engagement, Jan-Ivar. I am looking forward to hear why you disagree.

Orthogonally, I'll be proposing that the rules of conduct in the WG be amended to discourage the use of the thumbs-down emoji without elaboration. Noting disagreement without elaborating on the reasons serves no productive purpose.

@jan-ivar
Copy link
Member Author

Closing this as the double-mute problem was instead solved in w3c/mediasession#312.

Here's an example of how a website can synchronize application mute state with that of the browser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants