-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Losing connection while downloading large file may result in unrecoverable state #315
Comments
Might be a new thing? How large is the file? During my tests to improve the throughput I think I used files >2M. Also in the very beginning I had an MQTT temperature logger running for more than one week without error recovery. (But with the non-async API... There was no async back then) |
I'm downloading and discarding a firmware update, around 1MB, as part of a throughput test. The key is losing connection for more than a couple seconds. The indication that this happens is that the download speed drops to 0 and stays 0. 1-2 seconds of this is surviveable, but in this state as soon as the TX queue fills up with frames, the driver can't seem to recover. I'll test this more, I've done 2 days of hair pulling due to this. TX/RX buffer configuration matters a lot in terms of what happens if the issue hits (if the wifi stack exhausts its heap the app can completely die), but other than that I didn't make that much progress. Update on 28th: This seems to be a lot more messy than just losing connection. I can't seem to be able to reproduce this issue today, but I'm building on a different computer introducing more variables. This time I'm seeing the TX callback receiving I've managed a repro. This test run was made with bugadani@a447c5a which is slightly modified, but the problem is not specific to my fork. Logs are here though they are only indicative, not very informative. You can see The test I'm running timed out in line 5692, earlier than the last TX token consume. The test timeout drops the HTTP client, the HTTP connection, and the TCP socket. I am able to restart the test, around line 5827, but it just eats up the remaining TX tokens (which smoltcp would I think do anyway), and then nothing happens except for timers arming, disarming and sometimes firing. What I've been able to determine so far:
What I want to check:
|
My app has been running for 3.5 hours in a loop without the issue, so I assume #318 has fixed it. |
The "while downloading large file" may not be necessary but it's how I'm able to reproduce this issue the most reliably. It looks like the TX queue fills up, and never gets cleared afterwards.
The text was updated successfully, but these errors were encountered: