Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream POST request in order to handle large files #161

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

adam-lebon
Copy link

@adam-lebon adam-lebon commented Nov 10, 2024

The bug

I ran into a problem last year: when I tried to create or synchronize a challenge containing a large file (i.e. a forensics challenge with a 15 GB disk image), the entire file was put into memory before starting the request

This causes crashes since I only have 16GB of RAM in my computer.

The cause

Although the requests module supports body streaming when you pass a file pointer to the data parameter, it is not capable of streaming form-data.

When the requests module prepares the headers, it tries to calculate the Content-Length. As a result, the entire body will be stored in memory.

The fix

One solution would be to switch to another HTTP client, capable of streaming form-data.

I chose to modify as little code as possible. I made the choice to delegate the body encoding to the MultipartEncoder from the requests-toolbelt module. This requires a few modifications to the API class, since the MultipartEncoder takes parameters differently from requests.

As a result files must be sent with a filename hint:

# Before
api.post("/api/v1/files", files=[
    ( "file", open("./file.ova") )
], data={ ... })

# After
api.post("/api/v1/files", files={
    "file": ( "file.ova", open("./file.ova"))
}, data={ ... })

# If you want to send multiple files under the key "file", you can use tuple or list instead of dict
api.post("/api/v1/files", files=[
    ("file", ( "file.ova", open("./file.ova"))),
    ("file", ( "description.txt", open("./description.txt")))
], data={ ... })

The Multipart encoder helps requests to upload large files without the need to read the entire file in memory
@ColdHeat
Copy link
Member

Did you try compressing the file? I feel like streaming is okay but I think really the issue is that disk images are really huge.

@adam-lebon
Copy link
Author

adam-lebon commented Nov 26, 2024

Even when compressing asset, it is really common to upload very large files. It could be .ova or android disk image for example.

Storing large file into memory just to send a HTTP request is not ok. It's a known limitation of the requests module and it's a shame we need to use another module to achieve what should be a normal behavior.

(Sorry if I made grammar mistakes it's 2am in France)

@ColdHeat
Copy link
Member

My point is more that instead of switching out the behavior in ctfcli, it probably would have been better to just compress your file.

While I am roughly okay with the PR and using streaming, I am not sure if ctfcli should support behavior like this. No one really wants to download a 16 GB file.

@adam-lebon
Copy link
Author

adam-lebon commented Nov 26, 2024

I understand but sometimes we have simply no choice 😅

For example, this is a CTF we organize each year. The last edition, we got files up to 2.5GB. During deployments this caused big spikes of RAM usage in order to send the file, and sometimes it caused OOM errors.
https://github.com/BreizhCTF/breizhctf-2024/blob/main/Forensic/Tampered/dist/bzhctf.ova

Forensic challenges can be really big.16GB was the largest archive I've ever seen (I was a disk dump from a Windows server if I remember correctly)

@pl4nty
Copy link
Contributor

pl4nty commented Nov 30, 2024

This would help me too, I often have compressed forensics artefacts in the 1-5GB range. I upload to CTFd Cloud using beefy CI runners which don't crash, but synchronous reading into memory makes the upload pretty slow. My other workaround was manually uploading to external blob storage, then linking from CTFd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants