Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update deps and fix tests for new Ruby versions #59

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

JonathanTron
Copy link

An update to make tests pass on newer ruby versions (>= 3.1).

I switched CI to Github Actions, and added tests on 2.7, 3.0, 3.1, 3.2, and 3.3.
Ruby 2.7 is EOL since 2023-03-31, but as there is no issue with dependencies I included it in the CI.

…d update minimum ruby version supported to >= 2.7
In older rubies, using an `SSLSocket` before a session was started
worked as a normal (plain text) `Socket` (displaying a warning).
After ruby 3.1 it raised an error, the workaround is to use the
underlying `Socket` (via `to_io`) until an SSL session is started.
@wconrad
Copy link
Owner

wconrad commented Dec 31, 2023

This is an exciting PR, and looks like quality work. Thank you!

I'm getting a failure for four of the features when run on my box (details below). I assume this is just on my box, and the tests run cleanly for you. I'll start looking at this.

wayne@mercury:~/lab/attic/ftpd/gem$ rake test:features
/home/wayne/.rvm/rubies/ruby-3.2.2/bin/ruby -S bundle exec cucumber --format progress
Using the default profile...
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F-.....F-...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F-.....F-...............................................................................................................................................................................................................................................................................................................................

(::) failed steps (::)

550 Connection reset by peer
 (Net::FTPPermError)
./features/support/test_client.rb:216:in `with_binary'
./features/support/test_client.rb:75:in `put'
./features/step_definitions/put.rb:5:in `/^the client successfully puts (text|binary) "(.*?)"$/'
features/ftp_server/implicit_tls.feature:15:in `the client successfully puts text "ascii_unix"'

550 Connection reset by peer
 (Net::FTPPermError)
./features/support/test_client.rb:216:in `with_binary'
./features/support/test_client.rb:75:in `put'
./features/step_definitions/put.rb:5:in `/^the client successfully puts (text|binary) "(.*?)"$/'
features/ftp_server/implicit_tls.feature:22:in `the client successfully puts text "ascii_unix"'

550 Connection reset by peer
 (Net::FTPPermError)
./features/support/test_client.rb:216:in `with_binary'
./features/support/test_client.rb:75:in `put'
./features/step_definitions/put.rb:5:in `/^the client successfully puts (text|binary) "(.*?)"$/'
features/ftp_server/put_tls.feature:15:in `the client successfully puts text "ascii_unix"'

550 Connection reset by peer
 (Net::FTPPermError)
./features/support/test_client.rb:216:in `with_binary'
./features/support/test_client.rb:75:in `put'
./features/step_definitions/put.rb:5:in `/^the client successfully puts (text|binary) "(.*?)"$/'
features/ftp_server/put_tls.feature:22:in `the client successfully puts text "ascii_unix"'

Failing Scenarios:
cucumber features/ftp_server/implicit_tls.feature:11 # Scenario: Active
cucumber features/ftp_server/implicit_tls.feature:18 # Scenario: Passive
cucumber features/ftp_server/put_tls.feature:11 # Scenario: Active
cucumber features/ftp_server/put_tls.feature:18 # Scenario: Passive

283 scenarios (4 failed, 279 passed)
1484 steps (4 failed, 4 skipped, 1476 passed)
0m8.322s

@JonathanTron
Copy link
Author

JonathanTron commented Dec 31, 2023

Thanks for the kind words, I would have liked for the Github Action CI to run in order to check on linux, but who knows what combination of source/forked and branches would trigger a run properly 😀.
I will try to run it on another machine to see if I can reproduce the issues you have.

@wconrad
Copy link
Owner

wconrad commented Dec 31, 2023

Thanks for the kind words, I would have liked for the Github Action CI to run in order to check on linux, but who knows what combination of source/forked and branches would trigger a run properly 😀

I'll elevate your perms for this project if you want to try to figure that out (see also: Helping Me Whitewash The Fence)

I will try to run it on another machine to see if I can reproduce the issues you have.

Thanks!

Here's what I know so far. The server thread just dies... goes away, no exception, no nothing, right here in command_loop.rb:

    def gets_with_timeout(socket)
      socket = socket.to_io if @session.tls_enabled? && !socket.encrypted?    # Runs fine
      ready = IO.select([socket], nil, nil, config.session_timeout)                        # Does not return, nor raise an exception
      timeout if ready.nil?
      ready[0].first.gets
    end

This smells like one of those bugs that is of type "not very fun" and might not even be of your or my making.

@JonathanTron
Copy link
Author

Yeah I got a lot of those while trying to find the issue in ruby >= 3.1, but I tested only on MacOS, and the differences are always subtle between OSes when dealing with Sockets.
I'll get it to work as I need it for our test suite in a production app I will have to upgrade to 3.3 in 2024.

@wconrad
Copy link
Owner

wconrad commented Jan 1, 2024

Yeah I got a lot of those while trying to find the issue in ruby >= 3.1, but I tested only on MacOS, and the differences are always subtle between OSes when dealing with Sockets. I'll get it to work as I need it for our test suite in a production app I will have to upgrade to 3.3 in 2024.

I really appreciate you helping to bring this abandoned lib back to life.

I added two new commits in this branch; I think you should get them if you do a git pull. One commit adds your name to the contributor list. The other commit allegedly sets up github to run the (Linux) tests on every commit. I hope that helps.

@JonathanTron
Copy link
Author

Just a quick report of what I found so far.

I can reproduce the issues in a linux container.

After a lot of debug, I can now make it fail/pass based on a single sleep 0.1 right before calling conn.close in Net::FTP#storlines (similar code is used in other methods, but I'm focusing on this one for now).

It seems the data transfer socket (opened by Net::FTP to send the file) is closed before the server can even start reading from its receiving socket (resulting in a Errno::ECONNRESET: Connection reset by peer exception on the server when calling SSLSocket#read).

I'm not sure why it works on MacOS but fail on Linux, I suppose there's something different in the implementation of OpenSSL::SSLSocket#write_nonblocking between both OSes... I drilled down into the ruby code and could not see anything specific... So I'm wondering if I missed something more obvious.

@JonathanTron
Copy link
Author

After messing with debug logs pretty much everywhere in ruby source code, I reduced the issue to a script which pass all the time on MacOS and fail "sometimes" on Linux (docker image ruby-3.3.0-slim)... the feature on the other end fail everytime.

I'm attaching the script here, in case someone is fluent in IO/Socket/SSL on Linux.

For the server:

  1. create an OpenSSL::SSL::SSLServer (in a thread)
  2. handle new connections by spawning a thread from the accepted socket
  3. sleeping 1 second, in order to give time to the connected client to send all its data and close its connection.
  4. then do an IO.select (waiting for read)
  5. calling gets when the socket is ready,
  6. repeat until the message sent contains STOP.

Then I start a new client (in a thread, the behaviour is the same without the thread, or when running the client in a different process) which:

  1. connects to the server
  2. send 10 DEMO messages
  3. send a STOP message
  4. call shutdown on the client socket (doing nothing, as it's a Net::FTP::BufferedSSLSocket which overrides the Socket#shutdown method anyway), call read, then call close.
  5. ending the client thread
require 'io/wait'
require "openssl"
require "net/ftp"

$stdout.sync = true

def log_server(msg)
  $stdout.puts "\e[31m[SERVER]\e[0m #{msg}"
end

def log_server_session(msg)
  log_server "\e[35m[SessionThread]\e[0m #{msg}"
end

def log_client(msg)
  $stdout.puts "\e[34m[CLIENT]\e[0m #{msg}"
end

server = TCPServer.new("127.0.0.1", 0)
log_server "Server started on #{server.addr[2]}:#{server.addr[1]}"

ssl_context = OpenSSL::SSL::SSLContext.new
File.open(File.expand_path("../../insecure-test-cert.pem", __FILE__)) do |certfile|
  ssl_context.cert = OpenSSL::X509::Certificate.new(certfile)
  certfile.rewind
  ssl_context.key = OpenSSL::PKey::RSA.new(certfile)
end
server_ssl_socket = OpenSSL::SSL::SSLServer.new(server, ssl_context)

def start_session(session_socket)
  log_server "Starting new session #{session_socket}"
  Thread.new do
    sleep 1
    loop do
      begin
        log_server_session "waiting on readable session_socket"
        ready = IO.select([session_socket], nil, nil, nil)
        if ready.nil?
          log_server_session "timeout while waiting read on socket"
          break
        end
        data = ready[0].first.gets
        log_server_session "read: #{data.inspect}"
        if data =~ /STOP/
          break
        end
      rescue => e
        log_server_session "Error: #{e.inspect}"
        break
      end
    end
    log_server_session "ending"
    exit 1
  end
end

server_thread = Thread.new do
  # Thread.abort_on_exception = true
  loop do
    begin
      session_socket = server_ssl_socket.accept
      start_session(session_socket)
    rescue Errno::EAGAIN, Errno::ECONNABORTED, Errno::EPROTO, Errno::EINVAL
      IO.select([server_ssl_socket])
      sleep(0.2)
      retry
    end
  end
end

client_thread = Thread.new do
  begin
    log_client "start"
    client = TCPSocket.new("127.0.0.1", server.addr[1].to_i)
    ssl_client = OpenSSL::SSL::SSLSocket.new(client)
    ssl_client.connect
    ssl_client.sync_close = true

    # buffered_ssl_client = ssl_client
    buffered_ssl_client = Net::FTP::BufferedSSLSocket.new(ssl_client)

    12.times do
      buffered_ssl_client.write "DEMO\n"
      log_client "write 'DEMO'"
    end
    buffered_ssl_client.write "STOP\n"
    log_client "write 'STOP'"

    # To stay inline with `Net::FTP#storlines`
    buffered_ssl_client.shutdown(Socket::SHUT_RD) # this is overriden in Net::FTP::BufferedSSLSocket to not call `Socket#shutdown`
    buffered_ssl_client.read_timeout = 1
    buffered_ssl_client.read

    buffered_ssl_client.close
    log_client "socket closed"
  rescue => e
    log_client "Error: #{e.inspect}"
  end
  log_client "thread finished"
end

client_thread.join
server_thread.join

Example outputs:

  1. Failing Linux:
[SERVER] Server started on 127.0.0.1:33809
[CLIENT] start
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'STOP'
[SERVER] Starting new session #<OpenSSL::SSL::SSLSocket:0x0000ffff8381fa00>
[CLIENT] socket closed
[CLIENT] thread finished
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] Error: #<Errno::ECONNRESET: Connection reset by peer>
[SERVER] [SessionThread] ending
  1. Linux working:
[SERVER] Server started on 127.0.0.1:42311
[CLIENT] start
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'STOP'
[CLIENT] socket closed
[CLIENT] thread finished
[SERVER] Starting new session #<OpenSSL::SSL::SSLSocket:0x0000ffff7f23f8a0>
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "STOP\n"
[SERVER] [SessionThread] ending
  1. MacOS working:
[SERVER] Server started on 127.0.0.1:56239
[CLIENT] start
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'DEMO'
[CLIENT] write 'STOP'
[SERVER] Starting new session #<OpenSSL::SSL::SSLSocket:0x000000010102ff60>
[CLIENT] socket closed
[CLIENT] thread finished
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "DEMO\n"
[SERVER] [SessionThread] waiting on readable session_socket
[SERVER] [SessionThread] read: "STOP\n"
[SERVER] [SessionThread] ending

The idea is to have the server read from the socket, after the client has been disconnected.

I have no idea what is supposed to be the expected behaviour 😃

@wconrad
Copy link
Owner

wconrad commented Jan 2, 2024

The idea is to have the server read from the socket, after the client has been disconnected.

I have no idea what is supposed to be the expected behaviour

I haven't had time to dig into it like you have, and my life is full of things at the moment. But in a quick search, I get a not quite conclusive answer from the internet on whether the server is guaranteed to be able to read all the data after a disconnect. It might be that a normal disconnect (FIN) does guarantee that the server gets to read the data, but an abnormal disconnect (RST) allows for data to be discarded. Maybe more research is needed to find out if that's true per spec, and also respected by all the implementations we are working with. And maybe a look-see with tcpdump to see if an RST is involved at all in this pathology.

@JonathanTron
Copy link
Author

I haven't had time to dig into it like you have, and my life is full of things at the moment.

Yeah no worries, I'll continue to dig and see what needs some update.

But in a quick search, I get a not quite conclusive answer from the internet on whether the server is guaranteed to be able to read all the data after a disconnect. It might be that a normal disconnect (FIN) does guarantee that the server gets to read the data, but an abnormal disconnect (RST) allows for data to be discarded. Maybe more research is needed to find out if that's true per spec, and also respected by all the implementations we are working with. And maybe a look-see with tcpdump to see if an RST is involved at all in this pathology.

Indeed, I'll get a tcpdump and see.

@JonathanTron
Copy link
Author

JonathanTron commented Jan 2, 2024

I can confirm in the failure case, the last packet is:

13	0.002225	127.0.0.1	127.0.0.1	TCP	66	57446 → 36831 [RST, ACK] Seq=652 Ack=2044 Win=65536 Len=0 TSval=763828862 TSecr=763828862

So RST, ACK and no FIN in sight.

linux-fail.pcap.zip

And for the record a capture with a success on linux:

linux-success.pcap.zip

@JonathanTron
Copy link
Author

JonathanTron commented Jan 2, 2024

It's the same on MacOS but there it works...

But from what I can see every message on MacOS has an ACK:

39	12.799892	127.0.0.1	127.0.0.1	TLSv1.3	83	Application Data
40	12.799919	127.0.0.1	127.0.0.1	TCP	56	65351 → 65352 [ACK] Seq=2044 Ack=895 Win=407360 Len=0 TSval=3994093233 TSecr=2843217177

macos-success.pcap.zip

After looking more closely to the Linux capture, it also aknowledges each TLS message with an ACK, but labeled as Application Data in Wireshark.

@JonathanTron JonathanTron force-pushed the update-for-new-rubies branch 2 times, most recently from 26c9021 to 3858830 Compare January 6, 2024 13:07
@wconrad
Copy link
Owner

wconrad commented Jan 8, 2024

Hi @JonathanTron . I see the force push from a few days ago, to what I assume is your own branch in your fork, but it doesn't reflect in this pull request. Is that something you have to do to update the PR?

@JonathanTron
Copy link
Author

JonathanTron commented Jan 8, 2024

Hi @wconrad, I made a commit which made the test pass on a linux container locally, but ultimately didn't change a thing on CI and failed afterwards also on my container. Also, I pushed with the wrong email. So I reverted and force push as it the commit didn't matter.

@wconrad
Copy link
Owner

wconrad commented Jan 8, 2024 via email

@JonathanTron
Copy link
Author

@wconrad The issue seems to be in the net-ftp gem, so I fixed the features by monkey-patching the problematic bit from Net::FTP::BufferedSSLSocket... I opened ruby/net-ftp#31 for them to decide if the fix is acceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants