-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition with OTP ktls implementation and inet_driver #8561
Comments
@miriampena This is a race condition that can not be solved in any other way than doing: {active, true} is default and the options are inherited from listen to accept. So if a message arrives before ssl process has become the controlling process it will fail. This is documented in the User Guide for upgrading TCP sockets to ssl (TLS) ones. |
@miriampena I heard that your use case is actually the erlang distribution, and then this could be a bug and only the reproduction here is not valid. It is possibly ktls or inet-driver bug. We have a problem in some specific test, will come back to you on that. |
Hi Ingela, I was about to reach out to say that upon further inspection I was afraid that the reproduction I pasted was not triggering the right issue. It is indeed in happening in erlang distribution when we connect one node to a different node, then we call disconnect_node/1 and ping/1. If you do that ping/disconnect on a loop +50 times you get that race condition. It will show up as a timeout on a gen_tcp:recv/1 in the beginning of the handshake phase (recv_name_failed) and when we manually move the socket to {packet,0} we can actually see ssl data. Thanks for looking into it. |
We have a test that fails on some machines in a similar way. But we have not been able to find the cause. But it seems they are new Ubuntu versions. Alas the successful test cases that we have seen seems to not been run in a while and ktls is not supported everywhere so it not easy to keep track of. Looking at the code where I find recv_name_failed, it seems the timout is infinity! Could you share what is the value of Other? I guess it would be there that you should be getting timeout? |
Normally the connection will just time-out in the recv_name phase as, as you say, the timeout is infinity. What I do, just for the purposes of identifying this is a bug happening, is to setup a custom f_recv in HSDATA so if a recv times-out after 2 seconds I move the Socket from {packet,4} to {packet,0} to capture what is actually in the Socket.
This causes the Recv function in recv_name return something invalid rather than timing-out. The invalid format of the response triggers a receive_name_fail and then call the shutdown2/2 function. Then we can capture in the shutdown2/2 function a ssl message
|
That is not necessarily wrong, the bytes on the socket after the TLS handshake will always be "TLS packets". The ssl application will emulate the packet option that shall now apply to the decrypted data, so ssl:setopts might give you another result as long as the socket actually has received a whole "TLS packet". |
Oh yes sorry, forgot I guess it should be offloaded now to ktls that should handle the "TLS packet" and decrypt it. |
You could try splitting option setting in inet_tls_dist.erl to see if there could be something problematic in the inet-driver option handling. In code
Create a split
|
The bug is because even with {active, 1} inet_driver still buffers two ssl packets and translated them into two ssl_tls packets, sent one to the controlling process and wait next active to send another (so ktls option will not take effect because the data is already in inet_driver buffer instead of socket buffer) |
I created a PR that introduces a boolean flag |
Hi @RaimoNiskanen . I tested the PR with our code and it removes the race condition. Thanks for the fix! |
…' into maint * raimo/inet_drv-active-1-no-read-ahead/GH-8561/OTP-19175: Fix NULL pointer to memmove in tcp_deliver Use read_ahead in SSL for KTLS Fix NULL pointer to memmove in tcp_deliver Use read_ahead in SSL for KTLS Implement option for inet_drv read_ahead Fix possible bug for packet type change Avoid read-ahead for packet mode ssl_tls
…' into maint-26 * raimo/inet_drv-active-1-no-read-ahead/GH-8561/OTP-19175: Fix NULL pointer to memmove in tcp_deliver Use read_ahead in SSL for KTLS Fix NULL pointer to memmove in tcp_deliver Use read_ahead in SSL for KTLS Implement option for inet_drv read_ahead Fix possible bug for packet type change Avoid read-ahead for packet mode ssl_tls # Conflicts: # erts/preloaded/ebin/prim_inet.beam
Released in OTP-26.2.5.3. There appears to be other race condigions, not as obvious as this one. If you manage to pinpoint it, please open a new Issue. |
The story continues in PR#8785 |
…maint * raimo/net/implement-read_ahead/GH-8561/OTP-19199: Test read_ahead true|false performance Document option `read_ahead` Test read_ahead = false Don't try to recv peek the header Try to recv peek the header - failed (Linux bug?) Honour `{read_ahead,true}` and make it default again Implement read_ahead option for gen_tcp_socket Re-implement gen_tcp_socket to not read ahead
Describe the bug
When we test OTP ktls by performing +100 consecutive node disconnections and net_adm:ping(Node) we run into a situation in which we receive ssl raw packages in the acceptor after we do the OTP ktls handover.
What happens internally is that the inet_driver buffer receives messages from the client before we set the last ktls options on the socket, so it is unable to do the decoding of the data that is already on the buffer.
To Reproduce
Open two terminals, terminal one as server, terminal 2 as client:
<copy on terminal 1>
<copy terminal 2>
<copy on terminal 1>
The first gen_tcp:recv will timeout, the second gen_tcp:recv will receive a ssl packet instead of the tcp packet. Example:
Expected behavior
We receive a tcp package.
Affected versions
Erlang/OTP 26 [erts-14.2.1]
Additional context
The text was updated successfully, but these errors were encountered: