-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is frame-level fanout in scope of WebRTC encoded transform? #211
Comments
My personal thoughts:
Given this, I would think a separate API is a better choice. It would also allows to use both APIs jointly.
|
Funny, I was just typing this into a document for internal discussion (wanted to get some feedback before proposing this in the WG): Establishing a WebCodec-using sender and/or receiverWhen a sender (or receiver, but they’re similar enough that just the sender side is explained in detail here) is first established (through AddTransceiver, through ontrack, or through other means), the sender is in an unconfigured state; it leaves this state either by having its “transform” member set, or by starting to process RTP packets. There is a set of classes that implement the RTCRtpScriptSink interface (introduced in PR 207). For this version, we assume the same initialization steps as for RTCRtpTransformer. This mirrors the RTCRtpScriptTransformer interface, but has different semantics: When applied, the sender will not initialize its encoder component. It will expect all frames to come in via the RTCRtpScriptSink object’s “writable” stream. A similar process, using the RTCRtpScriptSource interface, applies to the RTPReceiver. |
Does it mean that a dedicated API might be a better fit than extending WebRTC encoded transform on your side? |
I don't think of those as separate APIs, rather that we'd want to have a set of building blocks that are able to be composed into the functionality we need. I think that the current RTCEncodedTransform consists of 3-4 of those building blocks (frame source, frame sink and a hidden feedback channel), and we need to tease them further apart. |
The document I wrote up to explain the idea that parallels @youennf 's comment (which doesn't yet contain any example code or WebIDL) can be viewed here: |
I think this proposal from @youennf is a great place to start and iterate. |
@youennf WDYT about supporting RTCEncodedAudio/VideoFrames with RTCRtpSenderEncodedSource? |
That is a fair summary.
It might be ok as a convenience/shortcut for web developers.
Why would we need to call |
A use case we want to support[1] is to get frames from two or more incoming PCs that are sending the same encoded data, and forward the frames to one or more output PCs as if they were coming from a single, more reliable peer connection. This forwarding occurs at multiple points in multiple network paths between the original servers sending the data and the final recipients and it involves dropping duplicate frames. Failure of individual input peer connections should be tolerated without the next node in the path noticing it. One of the requirements for this forwarding mechanism is to preserve the existing metadata of the source frames. In particular:
The only way to satisfy these requirements is for the The reason we need a setMetadata method is that, since the incoming frames come from multiple input PCs, it may be necessary to adjust some of the metadata of the output frame so that it properly reflects the decisions made by the forwarder. For example, frames with the same payload may have different frame IDs if they come from different servers. Thus, it must be possible to rewrite this ID for the output frame so that the next hop sees a consistent frame ID scheme. [1] First proposed in the July meeting |
I think it makes sense for the RTC encoder to be able to provide more data than what would provide a WebCodecs encoder. |
If using a webcodecs encoder to encode a frame that is to be sent on an RTCPeerConnection, there has to be a step that supplies the metadata that the WebCodecs encoder does not supply. Doing that as part of the conversion step between WebCodecs encodedFrame and RTCRtpEncodedFrame seems logical. It's a small amount of data. WRT the PT, I think we're getting to a consensus that frames should be marked with MIMEtype, and that the processing step that needs a PT should look up the PT from the MIMEtype - since this lookup is transceiver-dependent, it seems logical to do the lookup when the frame is enqueued on a packetizer, not before this. This means, of course, that MIMEtype needs to be writable. |
It is not always necessary to set PT. For the forwarding use case, you start with a received frame that already has full metadata, including PT. You just need to update some of the metadata for the output frame. |
One possibility would be to first define how to plug an encoder in the current pipeline. This two stage process would allow us to:
Here is a rough sketch of what it could be:
Another approach would be to reuse the script transform/transformer approach and not rely on transferable. |
Bikeshed comment: |
Note that we don't have to define an encoder transform for the frame-level fanout use case. The source isn't an encoder for that use case. But the topic of this issue seems to have drifted a bit. |
That seems to address a separate issue from the frame-level fanout / forwarding case. |
This issue had an associated resolution in WebRTC December 12 2023 meeting – 12 December 2023 (RtpSender Encoded Source):
|
Several proposals to extend WebRTC encoded transforms have been proposed to support frame-level fanout, aka one-ended sender streams (#207, #201, #202 for instance).
It does not seem there is consensus on whether WebRTC encoded transform is the right tool for this.
The alternative is a separate API (outside of the packet-level fanout vs. frame-level fanout discussion).
We should try to reach consensus on this question before diving in more concrete proposals.
The text was updated successfully, but these errors were encountered: