-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream POST request in order to handle large files #161
base: master
Are you sure you want to change the base?
Conversation
The Multipart encoder helps requests to upload large files without the need to read the entire file in memory
Did you try compressing the file? I feel like streaming is okay but I think really the issue is that disk images are really huge. |
Even when compressing asset, it is really common to upload very large files. It could be .ova or android disk image for example. Storing large file into memory just to send a HTTP request is not ok. It's a known limitation of the requests module and it's a shame we need to use another module to achieve what should be a normal behavior. (Sorry if I made grammar mistakes it's 2am in France) |
My point is more that instead of switching out the behavior in ctfcli, it probably would have been better to just compress your file. While I am roughly okay with the PR and using streaming, I am not sure if ctfcli should support behavior like this. No one really wants to download a 16 GB file. |
I understand but sometimes we have simply no choice 😅 For example, this is a CTF we organize each year. The last edition, we got files up to 2.5GB. During deployments this caused big spikes of RAM usage in order to send the file, and sometimes it caused OOM errors. Forensic challenges can be really big.16GB was the largest archive I've ever seen (I was a disk dump from a Windows server if I remember correctly) |
This would help me too, I often have compressed forensics artefacts in the 1-5GB range. I upload to CTFd Cloud using beefy CI runners which don't crash, but synchronous reading into memory makes the upload pretty slow. My other workaround was manually uploading to external blob storage, then linking from CTFd |
The bug
I ran into a problem last year: when I tried to create or synchronize a challenge containing a large file (i.e. a forensics challenge with a 15 GB disk image), the entire file was put into memory before starting the request
This causes crashes since I only have 16GB of RAM in my computer.
The cause
Although the
requests
module supports body streaming when you pass a file pointer to thedata
parameter, it is not capable of streaming form-data.When the
requests
module prepares the headers, it tries to calculate theContent-Length
. As a result, the entire body will be stored in memory.The fix
One solution would be to switch to another HTTP client, capable of streaming form-data.
I chose to modify as little code as possible. I made the choice to delegate the body encoding to the
MultipartEncoder
from therequests-toolbelt
module. This requires a few modifications to theAPI
class, since theMultipartEncoder
takes parameters differently fromrequests
.As a result files must be sent with a filename hint: