Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux can't connect to ports for 60s after VARA is closed/re-opened (issue with 'TIME_WAIT state' in Linux & VARA's port code implementation) #52

Open
WheezyE opened this issue Oct 11, 2022 · 10 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@WheezyE
Copy link
Owner

WheezyE commented Oct 11, 2022

This problem affects Linux/Wine, but does not occur on Windows. Fixing this for Linux would make VARA much more usable for users who do not have Windows.

@WheezyE WheezyE added the bug Something isn't working label Oct 11, 2022
@WheezyE
Copy link
Owner Author

WheezyE commented Oct 14, 2022

Thank you to KM4ACK & WH6AZ (of iOS RadioMail) for finding the root cause of this issue!

Just consolidating some notes & tests here on this github ticket so we can work in the open.

The VARA TCP re-connection problem

VARA-Wine-TIME_WAIT

  1. Linux TCP ports enter a TIME_WAIT state after the last connection on them is terminated. Apparently to prevent DoS attacks & also to prevent packet loss in some edge cases (references: 1, 2, 3)
  2. The state of each TCP port can be viewed with netstat | grep tcp. VARA's localhost:8300/localhost:8301 ports enter ESTABLISHED state when an RMS Express VARA HF P2P session is first opened. Then, as soon as VARA HF is closed, the 8300/8301 ports enter TIME_WAIT state for Linux OS's. I timed how long the ports stay in TIME_WAIT to be about 60 seconds. Then the ports disappear from netstat (they close & can be re-used again).
  3. While ports 8300/8301 are in TIME_WAIT, VARA HF will currently not attempt to connect to them. We believe that this is the cause of our issue. Said another way: If VARA is run, then VARA is closed, then VARA is run again, VARA will not re-establish a connection to RMS Express (or any other controller program over TCP) within this 60-second window.
Click to expand: Linux port states while VARA HF is connected, directly after closing VARA HF, and 60s after that.
pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date
tcp        0      0 localhost:51220         localhost:36759         ESTABLISHED
tcp        0      0 raspberrypi4:47374      165.254.191.199:https   ESTABLISHED
tcp        0      0 localhost:36759         localhost:51220         ESTABLISHED
tcp        0      0 localhost:53971         localhost:8301          ESTABLISHED
tcp        0      0 raspberrypi4:45066      ec2-3-208-217-166:https ESTABLISHED
tcp        0      0 raspberrypi4:60810      165.254.191.199:https   ESTABLISHED
tcp        0      0 localhost:56953         localhost:59036         ESTABLISHED
tcp        0      0 localhost:59036         localhost:56953         ESTABLISHED
tcp        0      0 localhost:8300          localhost:46719         ESTABLISHED
tcp        0      0 localhost:8301          localhost:53971         ESTABLISHED
tcp        0      0 localhost:46719         localhost:8300          ESTABLISHED
Thu 13 Oct 14:34:41 MDT 2022
pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date
tcp        0      0 localhost:51220         localhost:36759         ESTABLISHED
tcp        0      0 raspberrypi4:47374      165.254.191.199:https   ESTABLISHED
tcp        0      0 localhost:36759         localhost:51220         ESTABLISHED
tcp        0      0 raspberrypi4:45066      ec2-3-208-217-166:https ESTABLISHED
tcp        0      0 raspberrypi4:60810      165.254.191.199:https   ESTABLISHED
tcp        0      0 localhost:56953         localhost:59036         ESTABLISHED
tcp        0      0 localhost:59036         localhost:56953         ESTABLISHED
tcp        0      0 localhost:8300          localhost:46719         TIME_WAIT  
tcp        0      0 localhost:8301          localhost:53971         TIME_WAIT  
Thu 13 Oct 14:34:47 MDT 2022
...
pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date
tcp        0      0 localhost:51220         localhost:36759         ESTABLISHED
tcp        0      0 raspberrypi4:47374      165.254.191.199:https   ESTABLISHED
tcp        0      0 localhost:36759         localhost:51220         ESTABLISHED
tcp        0      0 raspberrypi4:45066      ec2-3-208-217-166:https ESTABLISHED
tcp        0      0 raspberrypi4:60810      165.254.191.199:https   ESTABLISHED
tcp        0      0 localhost:56953         localhost:59036         ESTABLISHED
tcp        0      0 localhost:59036         localhost:56953         ESTABLISHED
tcp        0      0 localhost:8300          localhost:46719         TIME_WAIT  
tcp        0      0 localhost:8301          localhost:53971         TIME_WAIT  
Thu 13 Oct 14:35:44 MDT 2022
pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date
tcp        0      0 localhost:51220         localhost:36759         ESTABLISHED
tcp        0      0 raspberrypi4:47374      165.254.191.199:https   ESTABLISHED
tcp        0      0 localhost:36759         localhost:51220         ESTABLISHED
tcp        0      0 raspberrypi4:60810      165.254.191.199:https   ESTABLISHED
tcp        0      0 localhost:56953         localhost:59036         ESTABLISHED
tcp        0      0 localhost:59036         localhost:56953         ESTABLISHED
Thu 13 Oct 14:35:48 MDT 2022
pi@raspberrypi4:/etc/init.d $

Things we've tried

  1. The Linux kernel has some network variables that we could (in theory) change to help us fix/diagnose this issue (1). Note: Some are integer vars, some are boolean vars.
  2. Setting net.ipv4.tcp_keepalive_time , net.ipv4.tcp_fin_timeout , & sunrpc.tcp_fin_timeout to int 1 doesn't seem to change anything(?) with/without network reset (TIME_WAIT still stays for 60s). Doing this also wrecks the internet on the Pi.
  3. Setting net.ipv4.tcp_tw_reuse to int 1 (global enable) doesn't change any behavior either.
  4. Some forums suggest forcibly cutting the connection on the TCP port with a program like Killcx. However, this does not address our TIME_WAIT problem, which is a port "busy" state that arises AFTER the port's connection is cut.
Click to expand: Viewing/editing/updating(?) Linux kernel variables in the terminal.
# view kernel variables
sysctl -A -r tcp # show tcp variables
sysctl -A -r tw # show some time_wait variables

# view current state of individual variables
sysctl net.ipv4.tcp_keepalive_time
sysctl net.ipv4.tcp_fin_timeout
sysctl net.ipv4.tcp_tw_reuse
#sysctl net.ipv4.tcp_tw_recycle #not available in RPiOS I think

# edit individual variables - these changes do not persist after reboot
sudo sysctl -w net.ipv4.tcp_keepalive_time=1
sudo sysctl -w net.ipv4.tcp_fin_timeout=1
sudo sysctl -w sunrpc.tcp_fin_timeout=1
sudo sysctl -w net.ipv4.tcp_tw_reuse=1

# can also view/edit state of individual variables like this
sudo nano /proc/sys/net/ipv4/tcp_fin_timeout

# reload network (needed for new variables to take effect?)
sudo service networking restart #could also run: /etc/init.d/networking restart

#reload variables (from config files?)
sudo sysctl --system
Click to expand: Killcx RPiOS installation (fyi. doesn't help)
#Killcx only detects VARA connection while it's active, does not disable TIME_WAIT state on ports
cd ~/Downloads
wget https://cfhcable.dl.sourceforge.net/project/killcx/killcx/1.0.3/killcx-1.0.3.tgz
7z x killcx-1.0.3.tgz
sudo apt-get install libnet-rawip-perl libnet-pcap-perl libnetpacket-perl
sudo chmod +x killcx-1.0.3/killcx
cat /etc/hosts # to confirm that localhost is 127.0.0.1
sudo killcx-1.0.3/./killcx 127.0.0.1:8300 tcp

Possible solutions

  1. Kindly ask VARA's dev, EA5HVK, if he would be able to make VARA's TCP/ports/sockets connection routine ignore a TIME_WAIT state on a port and connect anyway (similar to the C function "SO_REUSEADDR") (1,2)
  2. Create some sort of wrapper that runs instead of VARA which includes an "SO_REUSEADDR"-type function, configures VARA's port to be different for each run, and passes traffic to/from VARA-apps? (This would take a lot of work and might be buggy. I wouldn't even know where to begin to make something like this although I think it's theoretically possible).
  3. Make a script that monitors port states on Linux and warns users that VARA cannot be run during a 60s countdown window if ports 8300/8301 are found to be in the TIME_WAIT state. (KM4ACK's idea - he also has a prototype script written to do this).
  4. Make a daemon script that monitors for VARA in the background at all times: When VARA is run, log PID and wait for VARA to close. Upon VARA closing, reset the network with sudo service networking restart. (This is not a favorable option since it could cause users to lose internet connection / data unexpectedly).

@WheezyE WheezyE changed the title VARA's ports close after first connection VARA cannot connect to ports for 60s after VARA is closed/re-opened Oct 14, 2022
@WheezyE
Copy link
Owner Author

WheezyE commented Oct 16, 2022

I'm going to try the Possible Solution 2 (above): VARA-bridge-Linux for TCP connections, which was also recently suggested by EA5HVK after contacting him.

I'll start trying to write a bridge app in VB6 to see if I can circumvent the TIME_WAIT condition. If that succeeds, I'll see if sending source code to EA5HVK might help implement it in VARA. If that's not an option, then I'll see if I can complete the bridge app.

  • VARA appears to be written in VB6 (open it in notepad, see msvbvm60.dll listed).
  • Planning to follow this guide to install VB6.
  • I found an example of a VB6 program that ignores TIME_WAIT along with its source code here. They say "It also eliminates the TIME_WAIT period by setting the options "SO_LINGER" & "SO_REUSEADDR". For reasons unknown, I had to set both these options to achieve the desired result."
  • I also found a youtube tutorial on making a simple TCP connection app in VB6 (1, 2)

@WheezyE WheezyE changed the title VARA cannot connect to ports for 60s after VARA is closed/re-opened Linux can't connect to ports for 60s after VARA is closed/re-opened (issue with 'TIME_WAIT state' in Linux & VARA's port code implementation) Jan 27, 2023
@WheezyE
Copy link
Owner Author

WheezyE commented Mar 5, 2023

Updating this thread:

  • Jose said in the past that he doesn't know how to edit his VB6 code to make it ignore TIME_WAIT in Linux and can't do more to fix it, so this will probably remain a problem indefinitely.
  • Red (pe1rrr) made two workarounds for this issue to try out [1], [2]

@SpudGunMan
Copy link
Contributor

random ideas random words, fine to ignore as I haven't done a lot of data gathering to really give any, let alone that pe1rrr level of data!

I can connect and disconnect a lot with no issues it seems TCP connect projects like CHAT (with vara) -like a lot I cant replicate this error per-say but I dont use winlink much.

is this only .. winlink related and I saw possibly KISS connected phone app as well, (ouch just paid for it to debug this more myself: it will be extra handy platform to use since it focuses on vara in wine tcp kiss only really keeping it simple for this thread)

is this a function of a winlink specific clog? like the layer 6-7 needs looked at? with winlink and vara in tcpdump? to find any strange collisions? I was going to try and sniff how my dev box is not impacted (I am on 5.10 still) any all this rambling to hopefully help and say .. is this a winlink only issue? or any TCP applications? VarAC issues? need more eyes on problem for more data to make this go away. I have not looked at the provided links for solutions in detail yet to see if I am fully foolish in saying any of this but .. just sayin I did see network issues once and they did go away for me. I will get more data as time allows on the matter. love to hear more gonna dig into pe1rrr links as soon as possible. :) 73 hows the general license ;)

@WheezyE
Copy link
Owner Author

WheezyE commented Mar 10, 2023

Your ideas are always welcome! 😃 And thank you for being so interested and wanting to do so much testing.

I can connect and disconnect a lot with no issues it seems TCP connect projects like CHAT (with vara) -like a lot I cant replicate this error per-say but I dont use winlink much.

Over-the-air/radio-signal VARA connections/disconnections should work fine. However, since Linux TCP ports enter a temporary "TIME_WAIT" state after a program closes one of the ports, this usually causes an issue for VARA if any program closes VARA and then re-opens it (like RMS Express), or external programs that try to re-connect to VARA's TCP/IP ports over local/wifi connections (like RadioMail for iPhone).

is this a winlink only issue? or any TCP applications? VarAC issues?
It's an issue with VARA - specifically, how VARA has been programmed to deal with TCP port reconnections and TIME_WAIT stuff. I believe that there is a way to work around this in VB6 (the language VARA is programmed in), but I'm not a programmer and also don't have access to VARA's source code to test anything.

To be honest, I'm more interested in the wine/box86/emulation side of things and don't really use or test VARA much otherwise. Last I knew, these issues weren't fixed, but it's possible maybe Jose ended up patching this in. I haven't tested it in a while, but I think pe1rrr would know more since he's tested it more recently.

@WheezyE
Copy link
Owner Author

WheezyE commented Mar 10, 2023

This is all as far as I know... again, pe1rrr has more first-hand experience with the problem and the ways it impacts users. (@pe1rrr, feel free to correct any info I got wrong here)

@pe1rrr
Copy link

pe1rrr commented Mar 10, 2023

This is all as far as I know... again, pe1rrr has more first-hand experience with the problem and the ways it impacts users. (@pe1rrr, feel free to correct any info I got wrong here)

👍 So far so good.

@georges
Copy link

georges commented Jun 9, 2023

Great summary of an otherwise unfortunate issue. For what it's worth, this problem also occurs with CrossOver on macOS as well.

@georges
Copy link

georges commented Dec 21, 2023

FYI, anybody looking for a workaround for this, I've created varanny, a launcher for VARA. Amongst other things it helps start/stop VARA instance remotely and also can manage VARA.ini files to allow for multiple configuration to co-exist. It also takes care of service discovery by advertising VARA as DNS-SD. RadioMail has support for this since v 1.3.

https://github.com/islandmagic/varanny

@WheezyE
Copy link
Owner Author

WheezyE commented Jul 25, 2024

@georges This is long over-due, but I am very grateful for your incredible work on this. I will look forward to implementing it in the future.

I've been working on moving overseas (to Ireland) this year and it's been keeping me pretty busy. I'm looking forward to being settled in October to hopefully have more time to work on projects again.

Anyways, thank you again for this.

@WheezyE WheezyE added the enhancement New feature or request label Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants