-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enormous CPU usage when overflowing UDP socket with backend inet #7573
Comments
Let's see if I got the situation right... You have one UDP socket you call the "listener socket" that is the central socket to where clients (750 of them) sends requests as UDP datagrams. There is one process, the "listener process", in a tight loop over Each "allocation process" does an UDP request + gets a reply towards a "peer" over an UDP socket that is handled only by the "allocation process". Each "allocation process" then sends a reply with So we have a large set of "allocation processes" that hammer the "listener socket" with If so; i this a duplicate of GH-6455? Then it should be fixed in OTP-26.0? |
@RaimoNiskanen exactly! We open socket like this: {:ok, socket} =
:gen_udp.open(
port,
[
{:ifaddr, ip},
{:active, false},
{:recbuf, 1024 * 1024},
:binary
]
) receive data in this way: :gen_udp.recv(socket, 0) and send data with the folowing arguments: :ok = :gen_udp.send(state.turn_socket, c_ip, c_port, channel_data) The issue you linked sounds promising but we are already on OTP 26, Elixir 1.15
Here is the code if you wish to take a look: I linked specific commit as we moved to the Also, we don't notice such a behaviour when using I am willing to dive into this more, in particular I could check what functions are called under the hood but I couldn't find anything with |
Have a look at https://www.erlang.org/doc/apps/erts/beamasm#linux-perf-support in particular:
Also note:
|
Thinking again, the optimization in OTP 26.0 would not have any effect here. That was for a process that has a large message queue when calling One investigation step that can be done, mostly to rule out simple problems is to produce an Erlang Crash Dump e.g by calling But if there are no such simple problems, the most likely culprit here may be lock contention the (driver) port lock, on the listen socket (driver) port. I have heard a war story about a customer that got around a similar problem by cloning the port (open another socket port for the same file descriptor) and then used one port for input and the other for output, to avoid lock contention between the reader and the writers. Lock contention can be investigated by compiling an instrumented VM. See lcnt - The Lock Profiler. Unfortunately it is not as easy as adding a start option to the normal VM... Since NIFs leaves it to the user to implement locking the |
@RaimoNiskanen thanks, I will look into that. Keep in mind that the problem starts to occur after 5-20seconds so at the beginning everything works correctly.
By port you mean something different than |
By "(driver) port" here I mean an Erlang And I guess they used the |
I finally found a free time to dive into this problem a little more. Starting with
It looks like there are a lot of calls to |
Well, yes, the VM guys would like some |
Describe the bug
When opening an UDP socket with
inet
backend, we observe enormous CPU usage (~90% of every thread on 32 cores 64 threads machine) when process, which open the socket doesn't read data fast enough.We are implementing TURN server where many clients can send data to the same address. As long as we have 650 diffrent clients (each sending 50kbps of traffic in 100 byte datagrams) everything works correctly and avg. CPU usage stays around 10%. When we try to increase the number of clients to 750, after about 30 seconds, the CPU usage increases to 90% on every thread.
This doesn't happen when using
socket
backend. In case ofsocket
backend, packets are silently dropped (as they are not read fast enough) and the CPU usage stays around 25%, which is higher than in case ofinet
backend but stable.To Reproduce
This might be pretty hard. We can provide you with our server, which is an open source project and benchmarking script if you wish to.
Expected behavior
If a process is not reading data fast enough, data should be dropped without any impact on the CPU usage.
Affected versions
To our knowledge at least 24 and 26. We didn't test on 25.
Additional context
The architecture of the TURN server is as follows:
We have one listener process, which has one UDP socket, and N allocation processes. Every allocation process has its own allocation socket and a "reference" to the listener socket.
Clients (in the number of 750) sends data to the listener socket. Listener process reads data from the socket with
gen_udp.recv
and routes this data to the appropriate allocation process. Allocation process forwards data to its peer using allocation socket. Peer echos back data to the allocation socket, which is read by the allocation process and sent to the corresponding client using reference to the listener socket.My assumption is that after some time of not reading data fast enough from the listener socket, something breaks and impacts the sending process i.e. none of the allocation can successfuly send data on the listener socket. Because we have 750 processes that can write to the same socket, we observe high CPU usage on every thread.
The output from
perf top -g --sort=dso
at some arbitray moment after the crash (i.e. CPU usage increases) happensThe text was updated successfully, but these errors were encountered: