-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telnet console death issue #2344
Comments
I'm really starting to question if this is the reason the new console breaks. I will say once in this loop there is nothing to delete the connection that is in a failed state. |
ok I think I have this dialed in now. So many more debugs.. This shows a bad connection isn't being dealt with correctly. 024-02-02 23:09:13 DEBUG telnet_server.py:284 Start async _get_reader As you can see self._reader_process has a StreamReader that has an exception thrown with a dead socket, but it hasn't been removed. I'm going to add a wait_for on a drain call. If it fails after say 30 seconds then we haven't been able to get data out to a low water mark on the socket and its most likely dead. Then yank from connection dictionary and in theory this problem goes away. Crossing fingers... |
Fix located here. |
Thanks, I am going to check everything soon (need to catch up after holidays) 👍 |
Awesome, thanks for the fix, I hope we have finally squashed that annoying bug 👍 |
Its.. uh.. not. @grossmj Can you re-open this while I yell into the void in an overly dramatic manner? |
Here is the exception. I don't know how long it was hung on this call, but you can see it does timeout after a while. My guess is 2 hours. Oh I also feel like maybe the keepalives aren't always being set for some reason, but no evidence. Knee jerk reaction is asyncio.wait_for() again. I'll put sometime into it this weekend. 2024-03-01 02:36:54 ERROR base_events.py:1758 Task exception was never retrieved
future: <Task finished name='Task-1745' coro=<AsyncioTelnetServer.run() done, defined at /usr/share/gns3/gns3-server/lib/python3.10/site-packages/gns3server/utils/asyncio/telnet_server.py:188> exception=TimeoutError(110, 'Connection timed out') created at /usr/lib/python3.10/asyncio/streams.py:242>
source_traceback: Object created at (most recent call last):
File "/usr/bin/gns3server", line 8, in <module>
sys.exit(main())
File "/usr/share/gns3/gns3-server/lib/python3.10/site-packages/gns3server/main.py", line 87, in main
run()
File "/usr/share/gns3/gns3-server/lib/python3.10/site-packages/gns3server/run.py", line 267, in run
server.run()
File "/usr/share/gns3/gns3-server/lib/python3.10/site-packages/gns3server/web/web_server.py", line 352, in run
self._loop.run_forever()
File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/usr/lib/python3.10/asyncio/base_events.py", line 1901, in _run_once
handle._run()
File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/usr/lib/python3.10/asyncio/streams.py", line 242, in connection_made
self._task = self._loop.create_task(res)
Traceback (most recent call last):
File "/usr/share/gns3/gns3-server/lib/python3.10/site-packages/gns3server/utils/asyncio/telnet_server.py", line 207, in run
await self._process(network_reader, network_writer, connection)
File "/usr/share/gns3/gns3-server/lib/python3.10/site-packages/gns3server/utils/asyncio/telnet_server.py", line 269, in _process
data = coro.result()
File "/usr/lib/python3.10/asyncio/streams.py", line 669, in read
await self._wait_for_data('read')
File "/usr/lib/python3.10/asyncio/streams.py", line 501, in _wait_for_data
await self._waiter
File "/usr/lib/python3.10/asyncio/selector_events.py", line 862, in _read_ready__data_received
data = self._sock.recv(self.max_size)
TimeoutError: [Errno 110] Connection timed out |
@grossmj
Since there isn't really an open issue to track the original death of the telnet console I'm opening this. I've narrowed down where the issue is. I haven't figured out what to do about it just yet.
Here is a link to the telnet_server.py with debugs showing the issue.
telnet_server.py
Here is the debug logs when the death loop happens.
So.. the interesting thing here _get_reader is always returning None. This then causes a infinity loop in _process.
As you can see based on the debugs above _get_reader is making it all the way to that final if statement, which doesn't match and them returns None. I've looked at this a few time and I don't the intention of that compare.
BTW ignore the "__process reader_read is still 2nd None" log. I didn't have that in the right place and didn't feel like redoing everything.
Some other interesting things to point out. self._reader_process has a 0 byte buffer, but network_reader has a buffer of 65550. I have to poke around pdb to see what I can find, but if i recall from looking at this the socket on self._reader_process technically closed so its impossible to write data to it. I think this is why reader's buffer isn't getting drained. I'm not sure if maybe by calling future on a dead socket is what is causing this.
The way I'm personally triggering this is to setup a firewall rule then delete a telnet console session as i'm running find / on 2 telnet consoles. The telnet consoles will wedge in 10 seconds or so.
I still plan on finishing up proxy replacement, but its not ready. I got to the point where I can start and stop a qemu VM and the console works, but if i delete the devices I'm getting exceptions. It also doesn't support all the options the original telnet_server supports (such as passing in reader/writer).
The text was updated successfully, but these errors were encountered: