Long term reference support for AVC/HEVC #743

taste1981 · 2023-11-09T13:38:08Z

Long term reference picture marking and control is very useful to cope with frame lost in video conferencing, to avoid asking key frames that typically generates a large frame.

On Windows LTR marking is through three properties if HMFT is used:
https://learn.microsoft.com/en-us/windows/win32/medfound/codecapi-avencvideouseltrframe
https://learn.microsoft.com/en-us/windows/win32/medfound/codecapi-avencvideomarkltrframe
https://learn.microsoft.com/en-us/windows/win32/medfound/codecapi-avencvideoltrbuffercontrol

On Mac/iOS:
https://developer.apple.com/documentation/videotoolbox/kvtcompressionpropertykey_enableltr
https://developer.apple.com/documentation/videotoolbox/kvtencodeframeoptionkey_forceltrrefresh
https://developer.apple.com/documentation/videotoolbox/kvtsampleattachmentkey_requireltracknowledgementtoken
https://developer.apple.com/documentation/videotoolbox/kvtencodeframeoptionkey_acknowledgedltrtokens

It is expected AVC/HEVC codec registration defines VideoEncoderEncodeOptionsForAvc and VideoEncoderEncodeOptionsForHevc with the LTR marking/refresh control added.

Can we add this?

aboba · 2023-11-09T15:12:36Z

Related: #285, ietf-wg-avtcore/draft-ietf-avtcore-hevc-webrtc#17, w3c/webrtc-nv-use-cases#118

sandersdan · 2023-11-09T18:28:03Z

We (Chrome implementers) discussed this recently. It's possible, but there are some design considerations that need to be solved:

Not all encoders support it (eg. MediaCodec), so it needs to be feature-detectable.
Different encoders use different signaling mechanisms, so we probably need to find a way to unify them all.
Rather than creating serializable IDs, we could use opaque objects and leave mapping/serialization up to apps.
Putting IDs on encoded chunks is ambiguous since there can be more than one frame in an encoded chunk. In practice this feature probably won't be used together with frame reordering, and if superframes do get involved it's probably fine to interpret a single ID as representing all frames in the chunk.

It's also not clear to me what the exact behavior is when a large number of IDs are outstanding. There is only so much room to store references, eventually the encoder has to drop them. How much latency can there be before these schemes fall apart? How does this vary between encoder implementations?

taste1981 · 2023-11-10T03:09:54Z

We (Chrome implementers) discussed this recently. It's possible, but there are some design considerations that need to be solved:

Not all encoders support it (eg. MediaCodec), so it needs to be feature-detectable.

Are you suggesting to add something to be checked by MediaCapabilities API?

Different encoders use different signaling mechanisms, so we probably need to find a way to unify them all.

Rather than creating serializable IDs, we could use opaque objects and leave mapping/serialization up to apps.

Putting IDs on encoded chunks is ambiguous since there can be more than one frame in an encoded chunk. In practice this feature probably won't be used together with frame reordering, and if superframes do get involved it's probably fine to interpret a single ID as representing all frames in the chunk.

It's also not clear to me what the exact behavior is when a large number of IDs are outstanding. There is only so much room to store references, eventually the encoder has to drop them. How much latency can there be before these schemes fall apart? How does this vary between encoder implementations?

I may create prototype CL in Chromium to examine the behavior on Windows at least.

aboba · 2023-11-10T07:37:14Z

It's also not clear to me what the exact behavior is when a large number of IDs are outstanding. There is only so much room to store references, eventually the encoder has to drop them. How much latency can there be before these schemes fall apart? How does this vary between encoder implementations?

[BA] The viability of LTR in conferencing scenarios relies on assumptions about how participants manage long-term references. They're called long-term references because the participants need to keep them long-term or else an encoder can't be assured that it can rely on an LTR to create a delta frame that all the conference participants will be able to decode. And of course, regardless of how long LTRs are kept, you can have a recent joiner who won't have received the LTR. So it is quite possible for an encoder to create a delta frame based on an LTR and then receive a PLI in response. Ooops!

sandersdan · 2023-11-10T17:53:59Z

Are you suggesting to add something to be checked by MediaCapabilities API?

More likely a flag in VideoEncoderConfig so that VideoEncoder.isConfigSupported() can be used to detect it.

taste1981 · 2023-11-13T02:21:50Z

sandersdan@ max number of outstanding long term ref picture indices(as well as max value of long_term_ref_frame_idx + 1 ) must not exceed maximum possible reference pictures (for AVC/HEVC, it is 16);
If application explicitly specifies maximum reference frame numbers, that will be further reduced to application specified limitations.

taste1981 · 2024-05-21T00:51:57Z

@Djuffin / @aboba To simplify agent implementation, we could support 1 active long term reference picture and refresh it time by time.

To make the LTR usable, a few settings needs to be added to spec:

Encoder option to enable encoding with LTR marking, and the refresh interval.
Interface to notify encoder that the most recent LTRP has been received by other agents(feedback of acknowledgement on receiving LTRP. Optional).
Interface to ask encoder to recover using LTRP(using most recently acknowledged LTRP as reference).

aboba · 2024-05-21T03:32:29Z

@taste1981 There has been some work on low level encoding APIs that could support reference control. We will discuss the status at the W3C TPAC 2024 joint meeting of the MEDIA and WEBRTC WG. A link to the 2023 joint meeting:

2023 TPAC Encoder API Discussion

taste1981 · 2024-05-21T08:54:51Z

Happy to have low level API for that. However, LTR is a special case, that specifying updateBuffers/referenceBuffers is not sufficient.

For AVC/HEVC it's not only about updating reference slots, it also about marking of reference picture type. Of course by using short term reference it's possible for you to mimic behavior of long term reference, but you lose the interop with applications that really relies on true "LTRP" for error resilience.

aboba · 2024-09-13T02:03:04Z

@Djuffin @sandersdan @alvestrand A question about the new encoder API which was recently checked into Chrome (for AV1).

On the receive side, is there a way to ensure that an LTR is available to the decoder? For example, in a conferencing use case, it is not useful for the SFM to forward an LTR recovery request to a sender unless the participants to whom the new frame will be forwarded can decode it (e.g. the participants need to have received the LTR, decoded it and retained it as an potential reference).

It turns that that the RTCP RPSI message does not provide sufficient information for the SFM to make the decision (it's only useful for the 1-1 case), so implementations have created proprietary RTCP messages (e.g. LTN for libwebrtc, a PLI extension for Teams, etc.).

aboba mentioned this issue Nov 9, 2023

Section 3.2.1: Add N48 - bandwidth feedback speed w3c/webrtc-nv-use-cases#118

Open

aboba added the extension Interface changes that extend without breaking. label May 2, 2024

Djuffin self-assigned this May 14, 2024

Djuffin added the RTC Support for advanced RTC scenarios label May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long term reference support for AVC/HEVC #743

Long term reference support for AVC/HEVC #743

taste1981 commented Nov 9, 2023

aboba commented Nov 9, 2023 •

edited

Loading

sandersdan commented Nov 9, 2023

taste1981 commented Nov 10, 2023

aboba commented Nov 10, 2023 •

edited

Loading

sandersdan commented Nov 10, 2023

taste1981 commented Nov 13, 2023

taste1981 commented May 21, 2024 •

edited

Loading

aboba commented May 21, 2024 •

edited

Loading

taste1981 commented May 21, 2024

aboba commented Sep 13, 2024 •

edited

Loading

Long term reference support for AVC/HEVC #743

Long term reference support for AVC/HEVC #743

Comments

taste1981 commented Nov 9, 2023

aboba commented Nov 9, 2023 • edited Loading

sandersdan commented Nov 9, 2023

taste1981 commented Nov 10, 2023

aboba commented Nov 10, 2023 • edited Loading

sandersdan commented Nov 10, 2023

taste1981 commented Nov 13, 2023

taste1981 commented May 21, 2024 • edited Loading

aboba commented May 21, 2024 • edited Loading

taste1981 commented May 21, 2024

aboba commented Sep 13, 2024 • edited Loading

aboba commented Nov 9, 2023 •

edited

Loading

aboba commented Nov 10, 2023 •

edited

Loading

taste1981 commented May 21, 2024 •

edited

Loading

aboba commented May 21, 2024 •

edited

Loading

aboba commented Sep 13, 2024 •

edited

Loading