-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eternal loop in EthernetClass::socketSendUDP #78
Comments
I added detection of the eternal loop in socket.cpp by using a loop counter, and let it return just like it does when getting a TIMEOUT. This works, and it survives the event. BUT when this has happended a few times (I think about 4 times on the W5100), the Arduino continues running but no traffic goes out of or into the Ethernet card. So it seems some resource (a socket?) is not freed within the W5100. This fix was not sufficient. The modified EthernetClass::socketSendUDP shown below, located at the end of socket.cpp, fixes the eternal loop / freeze but not the underlying problem in the W5x00.
|
I added some code to EthernetUDP::endPacket in EthernetUdp.cpp to catch the first instance of the problem to try to correct it. I ended up with this change in EthernetUdp.cpp, letting it catch the error, close the socket, reset the W5100 without forgetting settings, then start with the same port again:
This function I did add to W5100 to force a reset but keep the essential settings:
|
I can confirm that all the changes above together detects and repairs the issue. The test program now runs fine, showing a growing number of failures without other effects than a temporary dip in the transfer speed when it happens. No changes in user sketches are needed to fix the problem. I could send a pull request for this, but I am not sure if you like the approach of adding the two global variables to avoid having to change parameters to/from existing functions (EthernetClass::socketSendUDP would have to have a second parameter udp_send_error it could modify, or it could be added as a member of the Ethernet class). Please advise. Should I send a pull request as-is (like above) or do you prefer them as members of the EthernetClass or as function parameters? If you know of some way to get the W5x00 out of the error state, other than the full reset+init, it would be great. I did not find any, but my experience with the W5x00 is not that deep. |
Ciao @fredilarsen I confirm I was able to replicate this issue. I see this as a potential attack vector specially if available in the public domain, anyone with bad intentions can forcefully disconnect / render useless the wiznet chip until manual or software reset using just flooding and some patience until it freezes (in few minutes), probably it is better to fix this as soon as possible. |
The above example to reproduce this is using UDP broadcast. |
My automation system with the fix has been running for 72 days continuously now, with an average packet count of about 300 packets/minute. This would lock up within hours to a few days previously. The fix can be found here: https://github.com/fredilarsen/Ethernet |
I am using a W5500 lite on a Arduino Due to retrieve the NTP Unix time. If the module is not attached, the function Udp.endPacket() runs forever. I hoped your fix would also have solved this, but it does not. Does this make sense ? thanks |
It may be locked somewhere else if the card is not attached, I have not investigated this. Is this a problem that could occur outside of test setups (why would the network card be detached)? The method I used for locating the other eternal loop was to add Serial.print+flush statements to the library code to trace how far it got and what return values were received. |
I can confirm that this happens in UDP and TCP land and the changes by @fredilarsen (as appropriate in the TCP side too) fix this. I was having frequent lockups which would trigger a watchdog and this solved it. |
Sorry if I in part repeat what I have already stated above, I can also confirm the occurrence of this when using UDP, not sure about TCP. I can confirm that the impressive work of @fredilarsen fixes the issue (I have tested only UDP). This is quite critical, for any practical use case of the library and the shield, considering it can crash arbitrarily after some time. This issue is open since november 2018, I hope someone will fix it and I say so because anyone aware of this can use it as an attack vector and remotely crash a target device with just bare flooding. |
It might be worthwhile to put this over at https://github.com/PaulStoffregen/Ethernet . @PaulStoffregen mentioned he no longer has write access to this repo and to flag issues on his "Fork" (considering this repo is originally his Fork). There's a few other places that get stuck (like flush() in EthernetClient) if there is network, but no connection to the server. This fixes flush:
I have PRed this fix into Paul's repo. I can also confirm this does get into an infinite loop too:
Dealing with this is going to be application specific however, because what do you return? Make changes as appropriate. |
@bleckers Thanks for the information that Paul does not have write access to this repo any more. I created a new ticket in the new repo, pointing back here for details: |
This change was originally made by @fredilarsen here: fredilarsen@9956e9e See the discussion here for more details: arduino-libraries#78
Description
I have a protocol that uses UDP broadcast to let devices communicate. This works fine except for that from time to time a random device will lock up and stop responding permanently. This may take days or multiple weeks to happen but is nevertheless catastrophic.
Having searched for the cause of this by printouts and patience I found that it is always stuck in an eternal loop in socket.cpp, in the EthernetClass::socketSendUDP function. It is waiting for SEND_OK, and will also exit if it gets TIMEOUT, but in my case it gets a RECEIVE.
This sounds similar to the TCP issue where the W5x00 receives while trying to send:
#17
Reproducing
I tried to speed up the process of triggering a freeze, and the attached program UDPCrash_nolibmod.ino does it within 1-30 minutes. Connect two Arduino Unos with a W5x00 based Ethernet shield each with USB to the PC, then start up two instances of Arduino IDE connected to one Arduino each, both with the serial monitor open at 115200bps. Flash the sketch to both Arduinos, modifying the MAC address for one of them before flashing.
Every 10 seconds each of the two Arduinos write a line telling how many packets it sends and receives. After some minutes one of the devices stop printing and the other will print 0 packets received from then on.
The program is trying to send a packet every millisecond, not a realistic scenario. But this is only to quickly catch the error that will occur after a much longer time in normal use. In my case with data exchange between multiple devices every 10 seconds.
Software environment
Arduino IDE 1.8.9 running on Windows 10
Hardware environment
Arduino Uno with the good version of the W5100 on-top netork shield, or Uno or Nano with the red small W5100 based network card or the small blue W5500 based network card.
Code to trigger the failure:
UDPCrash_nolibmod.ino
The text was updated successfully, but these errors were encountered: