VP8: do not forward RTP packets which payload contains a higher temporal layer than current one. #1009

jmillan · 2023-02-03T16:15:59Z

Motivation: #989, we have seen that we are sometimes sending more BW than provided by BWE.

We verified that SimulcastConsumer is properly selecting the target spatial and temporal layers based on the available bitrate. We switch to the target spatial layer as soon as we receive a keyframe from the corresponding stream. Such spatial layer becomes the current layer. From this point onwards, for the upcoming packets for the given stream we basically let the codec implementation decide whether they should be forwarded (based on the current and target temporal layer) by calling RtpPacket::ProcessPayload() which ends up calling the codec implementation Process() method.

v3 is forwarding every old packet (which picureId is lower than the last forwarded), even those with a temporal layer greater than the current one.

Example:

Our current temporal layer is 0 as per BW constraints or app decision.
- We should only send temporal layer 0 packets.
As long as the packets with temporal layer 1 come in order, they will be discarded, as they should.
Every packet with temporal layer 1 that is delayed from a previous forwarded one (this is, not arrived in order) will be forwarded to the endpoint.

This means that, as shown in #989, we are sending more BW than the current available indicated by BWE (or app) and thus:

We are wasting network and cpu resources.
We are potentially congesting an endpoint that cannot afford receiving such temporal layer 1.
We are making BWE misbehave by sending more BW than it announces (this is an observation from @sarumjanuch and makes a lot of sense).

Since we respect the codec keyframes in order to update layers, I'm pretty confident we are not incurring in any issue by dropping packets which temporal layer is higher than the current one.

Can you see any drawbacks here @ibc @vpalmisano @ggarber @jcague ?

…urrent one Fixes versatica#989

ibc

Will take a deeper look on Monday. Question: doesn't this affect H264 or VP9?

jmillan · 2023-02-03T19:24:53Z

Will take a deeper look on Monday. Question: doesn't this affect H264 or VP9?

I presume so, yes. I'm looking for side effects for this changes before applying them elsewhere.

So far it works as expected.

worker/src/RTC/Codecs/VP8.cpp

jmillan · 2023-02-06T10:43:57Z

This is the only delicate scenario that I can think of. Ie: We are forwarding temporal layer 1 and the following payloads arrive:

[PID: 1, TID: 0] -> It's forwarded, temporal layer 0.
[PID: 2, TID: 1] -> It's forwarded, temporal layer 1.
Due to BW restrictions or app decision, target temporal layer is set to 0.
[PID: 4, TID: 0] -> golden frame, arrived before [PID: 3, TID: 1] payload. It's forwarded and current temporal layer set to 0.
[PID: 3, TID: 1] -> discarded, TID is greater that current temporal layer.

Observations:

Discarding PID: 3 will NOT generate any ACK from receiver since we already handle this in SimulcastConsumer. RTP packet containing PID: 4 has a sequence number one unit higher than the RTP packet containing PID: 2.
Since we only switch temporal layers with golden frames, the decoder, already has all it needs to decode the new frame.

vpalmisano · 2023-02-06T11:34:43Z

[PID: 1, TID: 0] -> It's forwarded, temporal layer 0.

[PID: 2, TID: 1] -> It's forwarded, temporal layer 1.

Due to BW restrictions or app decision, target temporal layer is set to 0.

[PID: 4, TID: 0] -> golden frame, arrived before [PID: 3, TID: 1] payload. It's forwarded and current temporal layer set to 0.

[PID: 3, TID: 1] -> discarded, TID is greater that current temporal layer.

At this point if the app decides to set the temporal layer to 1 again:

[PID: 5, TID: 0] -> the current temporal layer is not updated, so it will be discarded ?
[PID: 6, TID: 1] -> current temporal layer is set to 1 and the packet it's forwarded

vpalmisano · 2023-02-06T11:37:01Z

worker/src/RTC/Codecs/VP8.cpp

@@ -325,7 +325,7 @@ namespace RTC
 			// clang-format off
 			if (
 				this->payloadDescriptor->hasTlIndex &&
-				this->payloadDescriptor->tlIndex > context->GetCurrentTemporalLayer()
+				this->payloadDescriptor->tlIndex == context->GetTargetTemporalLayer()


Maybe we need to keep the packets with temporal layer <= target ?

Suggested change

this->payloadDescriptor->tlIndex == context->GetTargetTemporalLayer()

this->payloadDescriptor->tlIndex <= context->GetTargetTemporalLayer()

We must only upgrade the temporal layer with a packet whose temporal layer is the target one.

ibc · 2023-02-06T14:11:57Z

What about if we allow packets with higher layers be forwarded only if their PID value is between S_PID - 10 and S_PID being S_PID the PID of the packet we used to switch current layers?

jmillan · 2023-02-06T15:02:38Z

At this point if the app decides to set the temporal layer to 1 again:
[PID: 5, TID: 0] -> the current temporal layer is not updated, so it will be discarded ?

Packets with lower TID than the current are always forwarded, since current temporal layer depends on them.

[PID: 6, TID: 1] -> current temporal layer is set to 1 and the packet it's forwarded

If this is packet can be used to change to temporal layer 1 then the current temporal layer is updated and packet is forwarded.

jmillan · 2023-02-06T15:06:03Z

What about if we allow packets with higher layers be forwarded only if their PID value is between S_PID - 10 and S_PID being S_PID the PID of the packet we used to switch current layers?

I'd avoid creating this kind of logic before knowing that the current proposal does not work as expected.

ibc · 2023-02-06T16:02:14Z

[PID: 6, TID: 1] -> current temporal layer is set to 1 and the packet it's forwarded
If this is packet can be used to change to temporal layer 1 then the current temporal layer is updated and packet is forwarded.

You mean current temporal layer is changed in the Consumer or in the packet context? Or in both?

jcague · 2023-02-07T11:13:03Z

Wouldn't this solution create additional PLIs and keyframes if the network between the Sender and the SFU is lossy?

Example: Given a steady state: no layer switching, current TL is 0, we filter out packets with TID > 0. WIth this PR this could happen:

     Sender ---> SN: 0, PID: 0, TID: 0 -----> MS forwards it as SN: 0, PID: 0, TID: 0
 +-- Sender ---> SN: 1, PID: 1, TID: 1 --x (Packet is lost before Mediasoup)
 |   Sender ---> SN: 2, PID: 2, TID: 0 -----> MS forwards it as SN: 2, PID: 2, TID: 0
 +-> Sender ---> SN: 1, PID: 1, TID: 1 -----> MS filters it out as it belongs to TL: 1

In this case, the receiver will never receive the packet with SN: 1, and it will send NACKs with no retransmission (as we dropped the packet), and it will end up sending a PLI. And this would happen to all receivers.

Something like what @ibc mentions above would reduce its impact I think. That said, I think that forwarding old packets with higher temporal layers could also break the video stream at the receiver side, but I haven't been able to demonstrate it so far :).

jmillan · 2023-02-07T11:56:55Z

Yes @jcague, that's true. Indeed I misinterpreted that when reading this exactly from @ggarber article. I'll do some local tests to verify a second concern I have, and comment here later.

jmillan · 2023-02-07T17:07:59Z

Before going forward with this draft I want to make sure what's causing the problem for #989. There, we can see that some Consumers are continuously receiving the T1 layer when they should receive the T0 only, and that extra T1 traffic cannot happen due to old packets (which is what this branch aims to avoid).

I have created a branch which logs info about each outgoing VP8 payloads. We'll try to repro #989 in the next days and see what is really happening.

ibc · 2023-02-07T19:40:03Z

Idea: let old packets with temporal layer greater than current temporal layer pass to the consuming endpoint UNTIL a keyframe (for the target spatial layer) is sent to it. Once a keyframe is received by the consuming endpoint there is no reason for the endpoint to request any keyframe via PLI.

jmillan · 2023-02-08T09:41:36Z

Idea: let old packets with temporal layer greater than current temporal layer pass to the consuming endpoint UNTIL a keyframe (for the target spatial layer) is sent to it. Once a keyframe is received by the consuming endpoint there is no reason for the endpoint to request any keyframe via PLI.

Yep, that would make it.

EDIT: Indeed I can't see at the moment how it would do it. Can you expose it within a flow in order to understand it. Ie: #1009 (comment)?

ibc · 2023-02-08T12:12:07Z

Yep, that would make it.

EDIT: Indeed I can't see at the moment how it would do it. Can you expose it within a flow in order to understand it. Ie: #1009 (comment)?

It was a conceptual idea. I cannot correlate it with real changes in current code. Also, now I think it doesn't make sense. Imagine we are in spatial layer 1 all the time and suddenly we make temporal layer 0 the preferred one. At this point the issue described in this ticket may happen depending on circumstances (packet loss etc) but there is no keyframe involved at all since we have never switched the spatial layer. So ignore it please.

ggarber · 2023-02-08T12:26:42Z

Sorry I'm late to the party. I don't think this change will help with #989 but any improvement in the layers forwarding that could help mitigate the (infrequent) decoding video artefacts we have seen it is welcomed.

This change looks good to me. At east the idea described, I havent' reviewed the implementation.

I think in the example shared by @jcague the browser will look at the tl0PictureIndex and given that there are no gaps in that index it will continue decoding the layer 0 without problems even if there is a gap in the pictureID.

ibc · 2023-02-08T12:29:34Z

Example: Given a steady state: no layer switching, current TL is 0, we filter out packets with TID > 0. WIth this PR this could happen:
 Sender ---> SN: 0, PID: 0, TID: 0 -----> MS forwards it as SN: 0, PID: 0, TID: 0
+-- Sender ---> SN: 1, PID: 1, TID: 1 --x (Packet is lost before Mediasoup)
| Sender ---> SN: 2, PID: 2, TID: 0 -----> MS forwards it as SN: 2, PID: 2, TID: 0
+-> Sender ---> SN: 1, PID: 1, TID: 1 -----> MS filters it out as it belongs to TL: 1
In this case, the receiver will never receive the packet with SN: 1, and it will send NACKs with no retransmission (as we dropped the packet), and it will end up sending a PLI. And this would happen to all receivers.

I think the answer is no, but... in this scenario can the SimulcastConsumer know in advance (by looking at PID or something else) them at the lost packet with SN 1 belongs to TL:1 so it must be discarded and hence packet with SN:2 is sent to the consulting endpoint with SN:1 instead? Pretty sure this is not possible so here another crazy proposal:

In this scenario, when delayed SN:1 is finally received by SimulcastConsumer and the device has sent NACK for SN:1, could SimulcastConsumer send an empty packet with SN:1? I mean, a packet with no payload so it would be silently discarded by the consuming device.

ibc · 2023-02-08T12:33:17Z

@ggarber not sure if I follow, there is no nice approach here but just 2 proposals with their own drawbacks:

Don't change anything and let old packets with higher TL be forwarded. This produces higher bitrate than expected and this PR aims to fix that. Let's not ignore this issue please.
Don't let any old packer with higher TL pass. This will produce NACKs for sequence numbers that the SimulcastConsumer won't be able to satisfy and hence it will trigger PLI from consuming devices.

ggarber · 2023-02-08T14:50:48Z

@ggarber not sure if I follow, there is no nice approach here but just 2 proposals with their own drawbacks:

Don't change anything and let old packets with higher TL be forwarded. This produces higher bitrate than expected and this PR aims to fix that. Let's not ignore this issue please.

0.1% higher bitrate imo.

Don't let any old packer with higher TL pass. This will produce NACKs for sequence numbers that the SimulcastConsumer won't be able to satisfy and hence it will trigger PLI from consuming devices.

I was suggesting to go with Option 2.

The PLIs in the receiver side are not triggered based on packet loss but on not having decodeable frames for 2secs. In this case there will be always decodeable frames from layer 0 so there shouldn't be any extra PLIs/keyframes.

ibc · 2023-02-08T15:46:10Z

I was suggesting to go with Option 2.
The PLIs in the receiver side are not triggered based on packet loss but on not having decodeable frames for 2secs. In this case there will be always decodeable frames from layer 0 so there shouldn't be any extra PLIs/keyframes.

Oh, it makes sense. Then... this PR is good (assuming code does what it's supposed to do), right?

ggarber · 2023-02-08T17:34:32Z

I was suggesting to go with Option 2.
The PLIs in the receiver side are not triggered based on packet loss but on not having decodeable frames for 2secs. In this case there will be always decodeable frames from layer 0 so there shouldn't be any extra PLIs/keyframes.

Oh, it makes sense. Then... this PR is good (assuming code does what it's supposed to do), right?

I didn't check the implementation but the concept of the PR is good imo, yes.

ibc · 2023-02-08T17:45:00Z

I didn't check the implementation but the concept of the PR is good imo, yes.

Requires further testing. Not sure yet whether it addresses the intended issue.

jmillan · 2024-01-17T16:33:58Z

Self note: Retest locally and prepare for merge.

jmillan · 2024-10-15T08:45:50Z

@namello-gather, was this a concerning issue for your environment? If so, did it have a positive impact without drawbacks?

Merge mediasoup issue versatica#1009, don't forward VP8 layers if not requested.

namello-gather · 2024-11-18T17:55:14Z

Hi @jmillan sorry I'm just getting to this ping. We've been running this in production for about a month now. We have observed better performance when targeting lower temporal layers with slightly better bandwidth estimation numbers. No drawbacks have been observed, if anything, we've increased our ratings in our lower end network users pool.

jmillan · 2024-11-22T08:16:13Z

Thanks for replying @namello-gather ,

if anything, we've increased our ratings in our lower end network users pool

What does this mean?

ibc · 2024-11-22T11:20:25Z

Should this PR fix issue #989? If so let's link it please.

namello-gather · 2024-11-22T18:15:33Z

Thanks for replying @namello-gather ,

if anything, we've increased our ratings in our lower end network users pool

What does this mean?

We bucket our rating system for our AV service by their general network capabilities, so our lower ratings have decreased and converted to higher ratings after releasing this to our user.

jmillan · 2024-11-22T18:41:34Z

Nice, thanks 👍

jmillan added 2 commits January 31, 2023 17:32

Worker: VP8, do not send frames with temporal layer higher than the c…

969f333

…urrent one Fixes versatica#989

Add tests

c1db768

ibc reviewed Feb 3, 2023

View reviewed changes

vpalmisano reviewed Feb 4, 2023

View reviewed changes

worker/src/RTC/Codecs/VP8.cpp Show resolved Hide resolved

vpalmisano reviewed Feb 6, 2023

View reviewed changes

namello-gather mentioned this pull request Oct 10, 2024

VP8: Don't forward higher temporal layer packets. (Upstream Pending PR) gathertown/mediasoup#2

Merged

namello-gather added a commit to gathertown/mediasoup that referenced this pull request Oct 22, 2024

Merge pull request #3 from jmillan/issue_989

e349ce3

Merge mediasoup issue versatica#1009, don't forward VP8 layers if not requested.

Merge branch 'v3' into issue_989

d08035f

ibc approved these changes Nov 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VP8: do not forward RTP packets which payload contains a higher temporal layer than current one. #1009

VP8: do not forward RTP packets which payload contains a higher temporal layer than current one. #1009

jmillan commented Feb 3, 2023 •

edited

Loading

ibc left a comment

jmillan commented Feb 3, 2023

jmillan commented Feb 6, 2023

vpalmisano commented Feb 6, 2023

vpalmisano Feb 6, 2023 •

edited

Loading

jmillan Feb 6, 2023

ibc commented Feb 6, 2023

jmillan commented Feb 6, 2023

jmillan commented Feb 6, 2023

ibc commented Feb 6, 2023

jcague commented Feb 7, 2023

jmillan commented Feb 7, 2023

jmillan commented Feb 7, 2023 •

edited

Loading

ibc commented Feb 7, 2023

jmillan commented Feb 8, 2023 •

edited

Loading

ibc commented Feb 8, 2023

ggarber commented Feb 8, 2023

ibc commented Feb 8, 2023

ibc commented Feb 8, 2023

ggarber commented Feb 8, 2023 •

edited

Loading

ibc commented Feb 8, 2023

ggarber commented Feb 8, 2023

ibc commented Feb 8, 2023

jmillan commented Jan 17, 2024

jmillan commented Oct 15, 2024 •

edited

Loading

namello-gather commented Nov 18, 2024

jmillan commented Nov 22, 2024

ibc commented Nov 22, 2024

namello-gather commented Nov 22, 2024

jmillan commented Nov 22, 2024

	this->payloadDescriptor->tlIndex == context->GetTargetTemporalLayer()
	this->payloadDescriptor->tlIndex <= context->GetTargetTemporalLayer()

VP8: do not forward RTP packets which payload contains a higher temporal layer than current one. #1009

Are you sure you want to change the base?

VP8: do not forward RTP packets which payload contains a higher temporal layer than current one. #1009

Conversation

jmillan commented Feb 3, 2023 • edited Loading

ibc left a comment

Choose a reason for hiding this comment

jmillan commented Feb 3, 2023

jmillan commented Feb 6, 2023

vpalmisano commented Feb 6, 2023

vpalmisano Feb 6, 2023 • edited Loading

Choose a reason for hiding this comment

jmillan Feb 6, 2023

Choose a reason for hiding this comment

ibc commented Feb 6, 2023

jmillan commented Feb 6, 2023

jmillan commented Feb 6, 2023

ibc commented Feb 6, 2023

jcague commented Feb 7, 2023

jmillan commented Feb 7, 2023

jmillan commented Feb 7, 2023 • edited Loading

ibc commented Feb 7, 2023

jmillan commented Feb 8, 2023 • edited Loading

ibc commented Feb 8, 2023

ggarber commented Feb 8, 2023

ibc commented Feb 8, 2023

ibc commented Feb 8, 2023

ggarber commented Feb 8, 2023 • edited Loading

ibc commented Feb 8, 2023

ggarber commented Feb 8, 2023

ibc commented Feb 8, 2023

jmillan commented Jan 17, 2024

jmillan commented Oct 15, 2024 • edited Loading

namello-gather commented Nov 18, 2024

jmillan commented Nov 22, 2024

ibc commented Nov 22, 2024

namello-gather commented Nov 22, 2024

jmillan commented Nov 22, 2024

jmillan commented Feb 3, 2023 •

edited

Loading

vpalmisano Feb 6, 2023 •

edited

Loading

jmillan commented Feb 7, 2023 •

edited

Loading

jmillan commented Feb 8, 2023 •

edited

Loading

ggarber commented Feb 8, 2023 •

edited

Loading

jmillan commented Oct 15, 2024 •

edited

Loading