Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate adaptation seemingly unstable #44

Open
samhurst opened this issue Apr 8, 2022 · 53 comments
Open

Rate adaptation seemingly unstable #44

samhurst opened this issue Apr 8, 2022 · 53 comments

Comments

@samhurst
Copy link

samhurst commented Apr 8, 2022

Hello,

I've been working with SCReAM a bit more lately, and I've encountered an issue where when using the GStreamer elements over a constrained network link, the rate adaptation seems unstable. I'm trying to simulate what happens when during a stream, the network capacity drops below the value configured as the maximum bit rate for the SCReAM sender. The sender is configured with a maximum bit rate of 10Mbit/s, using the settings -initrate 2500 -minrate 500 -maxrate 10000 -nosummary. The full GStreamer sending pipeline is below:

export SENDPIPELINE="videotestsrc is-live=true pattern=\"smpte\" horizontal-speed=10 ! video/x-raw,format=I420,width=1920,height=1080,framerate=25/1 ! x264enc name=video threads=4 speed-preset=ultrafast tune=fastdecode+zerolatency ! queue ! rtph264pay ssrc=1 ! queue max-size-buffers=2 max-size-bytes=0 max-size-time=0 ! screamtx name=screamtx params=\" -initrate 2500 -minrate 500 -maxrate 10000 -nosummary\" ! udpsink host=10.0.0.168 port=5000 sync=true rtpbin name=r udpsrc port=6001 address=10.0.0.194 ! queue ! screamtx.rtcp_sink screamtx.rtcp_src ! r.recv_rtcp_sink_0 "

I'm using the netem tool to constrain the bandwidth of the link to include 40ms of latency at each end (i.e. 80ms RTT), and limiting the sending rate of both machines to 8Mbit/s:

sudo tc qdisc add dev enp0s31f6 root netem delay 40ms rate 8Mbit limit 40

This is a graph of the actual transmission rate (green) and the target encoder bit rate (blue) looks like with the network restriction applied for a full five minutes:

scream-gst-x264enc-bw8Mbit-40ms-20220408-3

I think it's safe to say that the target bit rate selected is quite erratic, and it doesn't seem to match up with the graphs shown in README.md, where the line does seem to wobble a bit but it stays tightly bound around one point. I've also run the scream_bw_test_tx/rx application and I get results like this, which show a still unstable target encoder bitrate but it's a lot more closely grouped.

scream-scream_bw_test_tx-bw8Mbit-40ms-20220408-3

Using iperf3 in UDP mode, I see that the actual performance of the network is fairly stable, sending 10Mbit/s of traffic results in a pretty uniform 7.77Mbit/s of actual throughput.

I suppose my real question is - is this expected behaviour? The huge swings in target bit rate cause a lot of decoding artifacts in the video stream, and I see a lot of packet loss as it keeps bouncing off the limit. If this is not expected behaviour, can you tell me how best to optimise my sending pipeline to suit?

@IngJohEricsson
Copy link
Contributor

Hi
I have actually had issues when applying rate limitations with netem. I suspect that this is because it implements rate policing and that the integration time is rather long.
You may want to try this instead
sudo tc qdisc del dev p4p1 root
sudo tc qdisc add dev p4p1 root handle 1: htb default 1
sudo tc class add dev p4p1 parent 1: classid 1:1 htb rate 10000kbit
Unfortunately this does not work along with netem so you need to apply the netem delay on the reverse path

@IngJohEricsson
Copy link
Contributor

BTW... replace p4p1 with the applicable interface name :-)

@samhurst
Copy link
Author

Hello,

Thanks for getting back to me. I had done some testing with the tbf qdisc instead of just straight netem, but that didn't perform that well either. I've tried using the htb qdisc as you suggested, but I'm still seeing a lot of variability (including SCReAM's target bitrate shooting way beyond the 8Mbit/s that I'm trying to set:

scream-gst-x264enc-htb

I'm wondering if this reaction is due to some bursting behaviour in the htb that I'm not that familiar with how to control. I've tried adding a hard ceiling of 8Mbit/s with a burst of 6000 and a cburst of 1500, but that doesn't seem to do much to help:

scream-gst-x264enc-htb-8m-ciel-burst

@IngJohEricsson
Copy link
Contributor

IngJohEricsson commented Apr 11, 2022 via email

@samhurst
Copy link
Author

Hi Ingemar,

The i7-1185G7 that I'm running these tests on doesn't seem to struggle with 10Mbit/s 1080p (~24% CPU usage, or 3% for all 8 threads of the host CPU). I tried a test with a resolution of 640x480, but the encoder seems to top out at around 6.7Mbit/s there so it didn't hit the network bit rate limit. Instead, I changed the test so that the --maxrate option to screamtx was 7000, and limited the network bitrate to 5Mbit/s. I still observe the same behaviour:
scream-gst-x264enc-htb-5m-640x480

Many thanks,
-Sam

@IngJohEricsson
Copy link
Contributor

This is really strange. I probably need to try this out myself, but I don't believe I have time to do it until next week the earliest. Is it possible for you to try with the SCReAM BW test application on the same bottleneck ?

@samhurst
Copy link
Author

Just run that and here's the result, with the -maxrate parameter on scream_bw_test_tx also set to 7000:

scream-scream_bw_test_tx-bw5Mbit

In case it helps, here's the CSV from my screamtx run above:
gst-x264enc-htb-5m-640x480.csv

@IngJohEricsson
Copy link
Contributor

OK, this looks more reasonable. I have to look into what causes the problems with the plugin

@IngJohEricsson
Copy link
Contributor

Can you by (with the plugin example) chance also log the RTP bitrate, i.e the bitrate that comes from the video encoder ?
Or alternatively post the entire log ?

@samhurst
Copy link
Author

By "entire log", do you mean the whole GStreamer log? And I can probably get the actual bit rate using GstShark, I'm not sure if the x264enc element directly reports the bit rate output. If that's what you have in mind, I'll give that a go.

@IngJohEricsson
Copy link
Contributor

IngJohEricsson commented Apr 11, 2022 via email

@samhurst
Copy link
Author

Here's the GStreamer log with GstShark bitrate logging as well as GST_DEBUG=":2,scream:9". I didn't seem to get much output from GStreamer by itself, so I turned up the element debugging. Hopefully this gets you what you're looking for.

gst-x264enc-htb-5m-640x480-gstshark-dbg9.csv
gst-x264enc-htb-5m-640x480-gstshark-dbg9-trimmed.log.gz

@IngJohEricsson
Copy link
Contributor

Thanks for the log. I plotted a graph. It looks like the video encoder internal rate control is quite sluggish. It lags more than 2 seconds behind the target bitrate. I have not seen this behavior with x264enc before, it may be related to some parameter setting perhaps ?

image

@samhurst
Copy link
Author

samhurst commented Apr 12, 2022

That is possible. Before raising this ticket, I had also performed testing with vaapih264enc, but that had even worse performance (combined results with the first graph on this issue, although the colours are different):

scream-gst-vaapih264enc-bw8Mbit-40ms-20220408-3gst-x264enc-bw8Mbit-40ms-20220408-3

I'm using the version of libx264 that came with my distribution, version 160. The parameters which I'm passing into x264enc are threads=4 speed-preset=ultrafast tune=fastdecode+zerolatency, which is the same as is used in the sender.sh example in the gstscream/scripts/ directory.

If I run another test without any tuning, then the graph looks like this:

scream-gst-x264enc-notune-htb-5m-640x480-gstshark-dbg9

However, I'm almost certain this is because the encoder is hitting it's internal encoding bit rate limit for a 640x480 picture, if I up the frame size to 1024x576 then we're back to the yo-yo:

scream-gst-x264enc-notune-htb-5m-1024x576-dbg
gst-x264enc-notune-htb-5m-1024x576-dbg.csv
gst-x264enc-notune-htb-5m-1024x576-dbg-gst-trimmed.log.gz

@IngJohEricsson
Copy link
Contributor

OK, thanks. In the vaapi example it looks like an I-frame is generated every 4 seconds or so. You can perhaps try and set keyframe-period=100 , the spikes should then occur less often.
It is quite obvious that x264enc does not act optimally for this kind of rate adaptive use. I tried to look in the parameter set but I don't find anything that can be tuned. Perhaps one can try with and increased qp-step (default is 4) and set vbv-buf-capacity to a lower value than the default 600
Is there any chance you can try with nvenc or the h264 encoders that come with the NVIDIA Jetson Nano ?

@samhurst
Copy link
Author

I've tried playing with the qp-step and vbv-buf-capacity parameters like you suggested, but cranking up the qp-step to 32 and the vbv-buf-capacity down to 100 milliseconds doesn't make much difference.

Sadly, I don't have any immediate access to a Jetson nano, nor any other NVidia encoding/decoding hardware.

@IngJohEricsson
Copy link
Contributor

OK,
You can perhaps try with these additional parameters
-rateincrease 1000 -ratescale 0.2
as additional parameters after -maxrate 10000
in the sender.sh script
This will make the rate increase slower and thus reduce the risk of overshoot because the video encoder rate control loop is slow/sluggish (I had this issue with omxh264enc in the Raspberry PI)

@samhurst
Copy link
Author

I tried with -rateincrease 1000 -ratescale 0.2, and I could count fewer peaks but still the same large drops. I've also tried dropping them to -rateincrease 500 -ratescale 0.1 but it's just spacing the peaks out further, and doesn't seem to stop the overshoot with the large drop in response.

scream-gst-x264enc-htb-5m-1024x576-dbg-rateinc500-ratescale0 1

@IngJohEricsson
Copy link
Contributor

OK. Yes, one should expect more sparse peaks. I was hoping that it would reduce the drops but that does not work. Not sure what more can be done. It is difficult to handle such cases when the video coder rate control loops are this slow. Somehow I believe that it much be some setting. Perhaps the ultrafast preset that makes things bad ?, but I am just speculating here.

@jacobt21
Copy link
Collaborator

It's better to have TC set on a dedicated device. When this is not possible:
When using linux lc to constrain the rate of traffic flows, an application can receive backpressure forcing it to slow down the rate at which it sends traffic.
For example, when backpressure is applied, an iperf3 flow configured to send a udp traffic flow at a certain rate is not able to sustain that rate but will adapt to the rate configured in tc.
On the other hand, in a situation without backpressure being applied, the iperf3 flow will continue sending at the configured rate and tc will drop packets when the tc queue surpasses the configured limit.
Whether application backpressure or tc packet dropped can be controlled by the setting of the socket send buffer and the qdisc limit.
For example in the case of iperf3,

  1. using a socket send buffer of 212992 and a tc qdisc limit down to around 100 packets (varying dependent on the rate of the flows and the size of packets), application backpressure is experienced without tc packet drops.
  2. using a socket send buffer of 720896 and a tc qdisc limit of at least up to 256 there is no application backpressure, even with flows well above the configured tc rate limit.

The default and max socket send buffer is set using the command:
sysctl -w net.core.wmem_max=212992
sysctl -w net.core.wmem_default=212992
sysctl -w net.core.wmem_max=720896
sysctl -w net.core.wmem_default=720896
sysctl -w net.core.wmem_max=2097152
sysctl -w net.core.wmem_default=2097152
By changing these all send buffers will get this larger default.
The settings can be read using:
cat /proc/sys/net/core/wmem_default
cat /proc/sys/net/core/wmem_max

To make the change permanent, add the following lines to the /etc/sysctl.conf file, which is used during the boot process:
net.core.wmem_default=720896

@samhurst
Copy link
Author

samhurst commented Jun 8, 2022

Apologies for being quiet here for a while, a mixture of other projects taking time as well as some personal leave last month meaning it's been longer to get this response out than I'd hoped.

I ended up going away and trying to engage with both the GStreamer and x264 developers to see if there was any way of reducing the latency on the rate controller within the encoder, but this excercise did not bear much fruit. However, as a part of this effort I did end up writing a simple test that would take the SCReAM algorithm out of the loop and just allow me to see how the encoder reacted to standalone bit rate changes. I note that especially during times of observed congestion or rate limiting, the SCReAM algorithm could update the bitrate property serveral times a second, making it difficult to actually observe the reaction to the change. Here's an example of x264enc changing from 10Mbit/s to 8MBit/s, when running with a GOP length of 10:

x264-buffer-bitrate-test-idr-10-10mb-to-8mb

The data shown in the graph above is taken from gst-shark's buffer and bitrate tracers, each blue cross is the size of an encoded frame (against the left y-axis), the golden line is the bitrate set on the encoder (against the right y-axis), and the red line is a per-second average bitrate of the buffers that are flowing from the encoder. It takes at least a second for the x264 encoder to even begin ramping down it's encoded bit rate, and over two seconds before the average has reached the requested bit rate. Interestingly, it doesn't even seem to track with the GOP length, as I'd expect the encoder to use that as a natural point to break it's target bit rate, but x264enc doesn't seem to do this.

As you suggested (#44 (comment)) I managed to get some testing done with an NVidia GPU (RTX 3080) using nvenc. Using the same test as earlier, I can see that nvenc reacts quite differently to x264enc, with a reaction to the bit rate change occuring almost immediately:

nvh264-firebrand-test-strict-gop15-10to8mb

However, something that I did notice that is different to the behaviour of x264enc is that whenever the encoder is reconfigured with a new bit rate, it abandons the current GOP and creates a new IDR. The first few buffers of this are then fairly large, and no matter how much I try to tune nvenc, I can't seem to tame that behaviour. The encoder certainly does it's best to keep the average bitrate down after this, and the average paced out over a second is well below the requested bit rate.

I then moved onto running screamtx with nvenc, and I feel that the issue whereby every reconfiguration with a new bit rate starts to cause serious problems. I restricted the bandwidth to 8Mbit/s overall again, with a maximum allowed bandwidth of 10Mbit/s (with buffer sizes set how Jacob suggests).

scream-firebrand-nvh264enc-scream-sender-8Mbit-nozerolatency

firebrand-nvh264enc-scream-sender-8Mbit-nozerolatency

(Top graph is plotted from the SCReAM CSV file, bottom graph is plotted using output from gst-shark similarly to the ones above)

It looks like the SCReAM rate controller tries to set it's initial bandwidth, the encoder massively overshoots and then the rate controller tries to turn down the rate, which causes the encoder to keep overshooting. This keeps happening so much that the rate controller seems to just keep trending along the bottom of the allowed bit rate.

Is there any way of backing off the SCReAM congestion controller so that it doesn't do quite so many updates? I feel that this might solve this particular problem.

@jacobt21
Copy link
Collaborator

jacobt21 commented Jun 8, 2022 via email

@IngJohEricsson
Copy link
Contributor

Hi
Yes, it should be possible to add a hysteresis function that inhibits video rate updates until the target bitrate increases or decreases more than e.g +/- 5% . This change however requires some testing as there is a certain risk that the rate control can dead-lock. I cannot promise any update to this in a near future (i.e before the summer)

@samhurst
Copy link
Author

samhurst commented Jun 9, 2022

Thanks to Jacob's pointer, I've modified the nvbaseenc code to allow me to set those values to FALSE, and the NVidia encoder doesn't generate a new IDR every time there's a reconfiguration. This fixes the issue I was seeing with the rate never getting off the bottom of the graph, as it now behaves fairly normally:

scream-firebrand-nvh264enc-scream-sender-no-idr-on-reconfigure-8Mbit

However, I'm still seeing the oscillation that I was seeing with x264enc, even though this encoder is much better at reacting to bit rate changes.

@jacobt21
Copy link
Collaborator

jacobt21 commented Jun 9, 2022

I see the same pattern, but I attribute to the encoder, not scream. I this case scream keeps rate constant:
targetBitrate_rateRtp

@samhurst
Copy link
Author

Hi Jacob,

The green line on my graph indicates the targetBitrate as set by the SCReAM congestion controller. I should have specified that my testing was performed with a network-level limitation of 8Mbit/s, and I'm trying to understand what the behaviour of the scream congestion controller is when faced with a network that has a lower amount of bandwidth than what the SCReAM congestion controller was originally configured to use. For example, a mobile user streaming video that moves to a new mast that has higher levels of congestion and/or a lower peak throughput available for that user.

From my previous discussions with Ingemar, it seemed like the expected behaviour would be that the congestion controller would trend towards the network limit, and not keep going over and then dropping ~25% of the bit rate in reaction to network congestion. Currently, the only way I get a flat line for the target bitrate is if the configured SCReAM maximum bit rate is lower than the bandwidth available (i.e. network has 9Mbit/s of bandwidth, SCReAM configured with a maximum of 8Mbit/s).

-Sam

@IngJohEricsson
Copy link
Contributor

Hi Sam. As SCReAM adapts against the detection of increased queue delay you'll indeed get the behavior as shown in your figure. The reason is that once the queue starts to grow, you are essentially one round trip behind with the rate reduction, thus you'll get an overshoot. There are ways to reduce the magnitude of this oscillation. Try for instance with these extra options
-rateincrease 5000 -ratescale 0.2
That slows down the rate increase but it also reduces the overshoot.
/Ingemar

@IngJohEricsson
Copy link
Contributor

Hi

I have now added a -hysteresis option that should reduce the amount of small rate changes quite considerably . For instance with -hysteresis 0.1 the bitrate must increase more than 10% or decrease more than 2.5% (1/4th of the value) for a new rate value to be presented to the encoder. If that condition is not met , then the previous rate value is returned by the getTargetBitrate(..) function .
It is still needed to update the wrapper_lib that is used by the gstreamer plugin
/Ingemar

@samhurst
Copy link
Author

Hi Ingemar,

With what you say about the increased queueing delay, would a potential fix be to make the queue larger so that it covers multiple round trips? I could also try making the round trip itself longer using tc, and experiment with that. At the moment, I'm running on two hosts connected directly to one another so the round trip time is a couple of milliseconds at worst.

I've tried adding the options you described, but all it appears to do is decrease the frequency of overshoots that I see, not the amplitude of the reaction from the congestion controller.

Many thanks for all your help to date by the way, this has all been quite helpful. I look forward to testing the hysteresis and see if that helps matters.

-Sam

@IngJohEricsson
Copy link
Contributor

IngJohEricsson commented Jun 10, 2022

Hi
Actually the proposed settings should reduce the frequency of the overshoots. Try and set it real low (-rateincrease 5000 -ratescale 0.2)
It is actually quite difficult to target a specific queue delay over a longer time span. In essence it requires that the following conditions are fulfilled

  1. The video coder deliver video frames with an exact size
  2. The bottleneck has a constant throughput
  3. above is rarely true , 2) is essentially never true with cellular access technology like 4G and 5G.
    This actually leads us to L4S technology which allows quite fast rate adaptation speed and very little overshoot. So instead of trying engineer around the problem with overshoot, we strive to add a congestion signal from the network. That will allow for good performance in the presence of issues 1) and 2) above.

Yes, a very short RTT will itself increase the rate adaptation speed in SCReAM, we have mostly tried of an links with RTT 10ms or more.

/Ingemar

@samhurst
Copy link
Author

  1. Outside of the I-frames, the NVidia encoder seems to be fairly consistent with the size of each frame that it produces. Probably helped by the use of videotestsrc which is probably fairly easy to encode.
  2. In my testing environment, the bottleneck should have a very constant throughput. I've been very careful to make sure that the only traffic which is flowing over the link is that for the tests, it is not a default route for anything else and it's on a pair of dedicated gigabit ethernet devices. I've run several bitrate tests with iperf3 (in both TCP and UDP mode) and the results are very consistent with the values that I set.
  3. With those things in mind, are you saying that trying to benchmark SCReAM in the way that I am (a completely closed and controlled lab environment) will not generate any relevant data? I have looked into trying to set the ECN flags on packets but so far I've not had any luck with making it work, and I feel like I'm going to have to add an ECN-aware router in the middle of the test network in order to make that work.

I've since tried again with adding additional delay into my test network to simulate longer round trip times, including setting the qdisc limits as described in the netem documentation. The graph below shows the target bitrate for a test run with the following screamtx settings:

-initrate 6000 -minrate 4000 -maxrate 10000 -rateincrease 5000 -ratescale 0.2

With round trip times of ~1ms (blue line), 20ms (green line) and 100ms (red line):

scream-unstable-8Mbit-limited-bitrate-20ms-100ms-RTT-target-only

Running with a more realistic RTT seems to reduce the amplitude of the bit rate back offs a bit, but they're still pretty extreme. Here's the average requested bitrates over the three tests:

Metric 1ms 20ms 100ms
Mean 6611 kbps 6923 kbps 6949 kbps
Median 6461 kbps 7072 kbps 7098 kbps
Standard Deviation 817 kbps 787 kbps 747 kbps

So is this just what is to be expected with the setup that I've got at the moment?

@IngJohEricsson
Copy link
Contributor

Hi
I would still argue that it varies a lot more than what we have experienced. The question is still if you have some negative interaction between netem and SCReAM. Can you plot the estimated queue delay and RTT for the 20ms alternative. Or alternatively post the log file ?

@samhurst
Copy link
Author

Hi Ingemar,

Here's the SCReAM CSV which should have the values in it from my 20ms test run:

firebrand-nvh264enc-scream-sender-no-idr-on-reconfigure-ratescale-8Mbit-20msRTT-limit40.csv

@IngJohEricsson
Copy link
Contributor

Thanks. I plotted some extra, see below. It seems like packet loss occur at regular intervals, and that explains the large reduction in bitrate. What is also noticeable is that the queue delay increases rapidly on occasions and that may be attributed to how the netem rate policing/shaping is implemented ?.

image

@samhurst
Copy link
Author

samhurst commented Jul 7, 2022

Hi Ingemar,

I've been doing some more investigation with different queuing disciplines to perform the rate policing, but all of the ones I have tried (netem, htb, tbf, cake) result in basically the same plotted graph with the large downward swings when the target bit rate exceeds the rate set by the traffic shaping, even when introducing additional round trip time as described above.

As an aside, you mentioned in your previous message that you thought that the specific pattern of packet loss might be the cause of the swings. I know that SCReAM is designed to be used with ECN as a forewarning of packet loss, but GStreamer currently doesn't support ECN in udpsink and udpsrc. So I spent a bit of time adding ECN support to GStreamer so I could test SCReAM with that instead of having packet losses. In case you're interested, I am actually contributing my patch back to the GStreamer project, and you can find the merge request here. You should also find attached below a basic patch adding support for reading the GstNetEcnMeta from the buffers in screamrx, as well as a patch adding it to the gstreamer-rs rust bindings (renamed to .txt so GitHub will let me attach them).

0001-Add-ECN-support-to-screamrx.patch.txt
0001-Add-GstNetEcnMeta-binding.patch.txt

I performed some testing with these ECN changes and the ECN-aware CAKE queuing discipline, and it doesn't seem to have made any difference to the actual results:

scream-firebrand-ecn-comparison-target-only

In the above graph, the blue line is running with no ECN, the green line is running with ECN and the red line is running with ECN and a ~100ms RTT.

@jacobt21
Copy link
Collaborator

jacobt21 commented Jul 8, 2022

Hi Sam,

  1. Can you check rateCe and packetsCe, and add rateCE to the graph. If rateCe and packetsCe are zero, ecn marking doesn’t work.
  2. Do you set TC once or change dynamically during your experiment?
  • Jacob

@samhurst
Copy link
Author

samhurst commented Jul 8, 2022

Hi Jacob,

packetsCe does increase over the duration of the test, so that implies that ECN is working. The final value (144) doesn't quite correlate to the wireshark capture that I took alongside the test (which only counts 133 packets as being marked with CE), but the frequency and timing of the increases as shown by rateCe does seem to correlate with what I see in wireshark.

scream-firebrand-nvh264enc-scream-sender-max10Mbit-cake9Mbit-ecn-with-bigbufs

In case it helps, here is the CSV file from the above test run:

firebrand-nvh264enc-scream-sender-max10Mbit-cake9Mbit-ecn-with-bigbufs.csv

And I currently only set TC once before the beginning of the test, so the bit rate ceiling is constant throughout in an effort to understand this specific oscillating behaviour when exceeding the bit rate ceiling.

-Sam

@jacobt21
Copy link
Collaborator

jacobt21 commented Jul 8, 2022

Hi Sam.
"when exceeding the bit rate ceiling" What do you do to exceed the bit rate ceiling ? Something has to change, tc config, extra traffic ? something else ?

  • Jacob

@samhurst
Copy link
Author

samhurst commented Jul 8, 2022

By which I mean the maximum bit rate configured in tc. I'm trying to understand SCReAM's behaviour when the available network bandwidth drops below that configured as the maximum in screamtx. At the moment, in order to reduce the number of variables in play, I'm keeping the maximum bit rate of the test network static throughout the test.

@jacobt21
Copy link
Collaborator

jacobt21 commented Jul 8, 2022

Hi Sam,
But what do you change ?
BTW, can you try with setting key-int-max to a large value (to effectively disable it)

  • Jacob

@samhurst
Copy link
Author

samhurst commented Aug 4, 2022

Hello,

Apologies for taking a while to get back to this, I have been pulled away onto other things.

I went looking to understand why the packet loss was happening in the groups that Ingemar observed. As a first step, I moved away from my local test network using tc and managed to set up a test that runs over the public internet, so as to have a more representative test. The VDSL2 connection doing the upload has a reliable maximum throughput of about 17MBit/s, so that would be my target. However, I discovered that even after increasing the send and receive buffers at both ends of the test, that even running at 6MBit/s I was still seeing those same bursts of packet loss.

After analysing the traffic in wireshark, it is clear that the RTP packets for each frame are being bursted out together. All of the analysed packet loss occurred on packets at the end of these bursts. The following wireshark I/O graph shows the number of packets transmitted every 10ms, with the video running at 25fps so high enough resolution to see the peaks every 40ms.

wireshark-scream-packet-pacing

Using the RTP analysis tools, you can see that the delta time between packets is very small within packets for a given frame, but between frames the delta is large:

wireshark-scream-packet-gaps

I note that the SCReAM library (in code/ScreamTx.cpp) has a packet pacing algorithm in it, but I'm not sure it's working effectively in this instance. I started analysing the flow of buffers back out of the SCReAM library into GStreamer land, and ended up graphing the latency of buffers being passed into the screamtx callback function using the GST_TRACE message there. The following graph shows the time that buffers pass back to screamtx on the x axis, with the time since the last buffer was received on the y axis. The graph is zoomed into about 3 seconds of a stream, and well after the 5 seconds backoff period has elapsed inside the pacing algorithm:

screamtx-debugging-long-openstack-time

This shows that there are clusters of buffers with large, often ~40ms (1 frame time) differences between them. Some of them are even more than that, but I'm not sure why that's happening at the moment.

I think this isn't related to the problem at hand, but I note that the screamtx element does not change the pts or dts timestamps on any of the GstBuffers that it sends onwards. This means that GStreamer may well end up buffering all those packets up before releasing to udpsink, but I note that all your examples set the sync=false flag on udpsink, which I have also been doing in my testing so that shouldn't be a problem here, but might be worth thinking about how to make GStreamer do some of the heavy lifting for you in this instance.

@jacobt21
Copy link
Collaborator

jacobt21 commented Aug 5, 2022

Hi Sam,
Very good information If you could summarize and create a doc , this would be useful for all users.
I'm wondering if better to use Linux tc for pacing . How this will affect end to end latency ?

@samhurst
Copy link
Author

samhurst commented Aug 5, 2022

I started doing some more poking about in ScreamTx.cpp, adding some debug prints in to print various values so I could try and understand what the maths is doing in the algorithm around pacing to see if I could figure out why it wasn't working. During some of my testing, I noticed a few periods where it was actually pacing the packets out as I'd expect it to, so I went looking. The following graph is an expanded form of the one I showed in my last comment, with the blue line again showing the time that buffers pass back into screamtx against the time since the last buffer was received.

screamtx-debugging-long-rd646-ntpdbg-paceheadroom-0 1-no-1 5-until-loss

I've also added two new lines which track values inside ScreamTx.cpp, red for nextTransmitT_rtp within isOkToTransmit at line 524, before it performs this if statement; and green for paceInterval_ntp at line 606 before it is used to update nextTransmitT_ntp.

The blue and red lines should be tighter synchronised, but it's basically impossible to synchronise the timestamps between GStreamer's logging for the blue line and any timestamps in the SCReAM code for the red and green lines - but it's safe to imagine that the blue and red spikes should be overlayed upon one another.

What I notice from this is that paceInterval_ntp is always 0 whenever it's not pacing correctly, but when it is non-zero then the pacing happens correctly. I think this is because paceInterval is reset to kMinPaceInterval here which in my case always seems to be 0, but then the following if statement never evaluates to true because queueDelayFractionAvg is < 0.02. I haven't yet been able to get my head around why this is the case yet, but I'll have another look next week if I can. In the meantime, any insight would be welcome to help understand this a bit better.

@jacobt21
Copy link
Collaborator

jacobt21 commented Aug 6, 2022

Hi
A short reply, still on vacation with only cellphone at my hands.
Packet pacing is actually turned off when the link capacity is considerably higher than the transmitted bitrate. The estimated queue delay is then very low. This is to avoid that video frames are delayed unnecessarily when there is no congestion.
To be honest, I believe that these 10-15ms gained in e2e latency by turning off the packet pacing is more of academic importance.
Thanks anyway for digging in the code like this. I definitely believe there is potential for improvement here.
/Ingemar

@samhurst
Copy link
Author

Hello,

Thanks for the response, even when on vacation.

Turning off the packet pacing when the estimated bit rate is higher than the actual throughput is a very interesting decision. I can't find anything in RFC 8298 that describes such behaviour, so I'm assuming this is just a feature of this particular implementation?

I'm not sure that it's worth it for that 10-15ms gain, because of the problems that it can cause on any network. The application at each end doesn't really have any visibility into the actual levels of congestion on the network, outside of packet losses and ECN flags. If the buffers in the switches and routers on the network path are already fairly high then the bursts of traffic could easily overwhelm them and then cause those packet loss events. I'd guess this is why RFC 8085 specifies that you should always pace UDP traffic.

I've tried fiddling with the code to try and force packet pacing to be on all the time, but I haven't been successful thus far.

-Sam

@IngJohEricsson
Copy link
Contributor

Hi
And thanks for taking time to dig into the code and thanks for very useful input.

It should be possible to force packet pacing on by changing line
https://github.com/EricssonResearch/scream/blob/master/code/ScreamTx.cpp#L1221
To
if (true && kEnablePacketPacing) {

I cannot guarantee that it will work right out of the box, there is for instance a risk that the rate ramp-up will slow down, at least initially. It will be a few days before I can try it out myself.

/Ingemar

@samhurst
Copy link
Author

Hi Ingemar,

I tried your suggestion and while it resulted in paceInterval_ntp is no longer always 0, it didn't seem to make much difference, as the value is still quite low and I see poor pacing on the output with large gaps between the end of one frame's worth of RTP packets and the start of the next frame. It feels like there's a calculation going wrong somewhere with how it's calculating the pacing value, but I can't see it.

-Sam

@IngJohEricsson
Copy link
Contributor

It is in order that you'll get a gap between last RTP packet of the previous frame and the first RTP packet of the following frame. This is because the pacing bitrate is a bit higher than the nominal (video coder) bitrate, the reason is that the video frame sizes fluctuate a bit +/- around the average frame size and it is undesirable to let RTP packets stay in the RTP queue just because the video frame was slightly larger than normal.
Normally, if the frame interval is 20ms, you should expect to see a ~5ms gap between transmissions, but as the frame sizes vary, this gap will also vary

@samhurst
Copy link
Author

Hello Ingemar, Jacob;

Thanks again for all your comments and help so far. I've been wondering if you've managed to have a look into this yourselves? I was also wondering if it might be worth trying to set up a meeting where we can discuss things in real time rather than via GitHub. If you think that might be a good idea, do you perhaps have some availability next week?

Best regards,
-Sam

@IngJohEricsson
Copy link
Contributor

IngJohEricsson commented Oct 11, 2022 via email

@samhurst
Copy link
Author

Hi Ingemar,

It looks like your previous message is a copy of a previous message sent to this issue - was this intentional? Or did your mail client eat the message you wanted to reply with?

-Sam

@IngJohEricsson
Copy link
Contributor

The message 9 days age was a follow-up on the message posted august 16

@jacobt21
Copy link
Collaborator

jacobt21 commented Nov 20, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants