Stream POST request in order to handle large files #161

adam-lebon · 2024-11-10T20:25:35Z

The bug

I ran into a problem last year: when I tried to create or synchronize a challenge containing a large file (i.e. a forensics challenge with a 15 GB disk image), the entire file was put into memory before starting the request

This causes crashes since I only have 16GB of RAM in my computer.

The cause

Although the requests module supports body streaming when you pass a file pointer to the data parameter, it is not capable of streaming form-data.

When the requests module prepares the headers, it tries to calculate the Content-Length. As a result, the entire body will be stored in memory.

The fix

One solution would be to switch to another HTTP client, capable of streaming form-data.

I chose to modify as little code as possible. I made the choice to delegate the body encoding to the MultipartEncoder from the requests-toolbelt module. This requires a few modifications to the API class, since the MultipartEncoder takes parameters differently from requests.

As a result files must be sent with a filename hint:

# Before
api.post("/api/v1/files", files=[
    ( "file", open("./file.ova") )
], data={ ... })

# After
api.post("/api/v1/files", files={
    "file": ( "file.ova", open("./file.ova"))
}, data={ ... })

# If you want to send multiple files under the key "file", you can use tuple or list instead of dict
api.post("/api/v1/files", files=[
    ("file", ( "file.ova", open("./file.ova"))),
    ("file", ( "description.txt", open("./description.txt")))
], data={ ... })

The Multipart encoder helps requests to upload large files without the need to read the entire file in memory

ColdHeat · 2024-11-26T01:03:19Z

Did you try compressing the file? I feel like streaming is okay but I think really the issue is that disk images are really huge.

adam-lebon · 2024-11-26T01:08:11Z

Even when compressing asset, it is really common to upload very large files. It could be .ova or android disk image for example.

Storing large file into memory just to send a HTTP request is not ok. It's a known limitation of the requests module and it's a shame we need to use another module to achieve what should be a normal behavior.

(Sorry if I made grammar mistakes it's 2am in France)

ColdHeat · 2024-11-26T01:19:13Z

My point is more that instead of switching out the behavior in ctfcli, it probably would have been better to just compress your file.

While I am roughly okay with the PR and using streaming, I am not sure if ctfcli should support behavior like this. No one really wants to download a 16 GB file.

adam-lebon · 2024-11-26T01:26:07Z

I understand but sometimes we have simply no choice 😅

For example, this is a CTF we organize each year. The last edition, we got files up to 2.5GB. During deployments this caused big spikes of RAM usage in order to send the file, and sometimes it caused OOM errors.
https://github.com/BreizhCTF/breizhctf-2024/blob/main/Forensic/Tampered/dist/bzhctf.ova

Forensic challenges can be really big.16GB was the largest archive I've ever seen (I was a disk dump from a Windows server if I remember correctly)

pl4nty · 2024-11-30T01:57:33Z

This would help me too, I often have compressed forensics artefacts in the 1-5GB range. I upload to CTFd Cloud using beefy CI runners which don't crash, but synchronous reading into memory makes the upload pretty slow. My other workaround was manually uploading to external blob storage, then linking from CTFd

adam-lebon added 3 commits November 10, 2024 21:03

Add MultipartEncoder to support request streaming

5d7d13a

The Multipart encoder helps requests to upload large files without the need to read the entire file in memory

Remove unused typing

f0d7a8c

Remove duplicate call

27e87e1

Merge branch 'CTFd:master' into master

fe447fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream POST request in order to handle large files #161

Stream POST request in order to handle large files #161

adam-lebon commented Nov 10, 2024 •

edited

Loading

ColdHeat commented Nov 26, 2024

adam-lebon commented Nov 26, 2024 •

edited

Loading

ColdHeat commented Nov 26, 2024

adam-lebon commented Nov 26, 2024 •

edited

Loading

pl4nty commented Nov 30, 2024

Stream POST request in order to handle large files #161

Are you sure you want to change the base?

Stream POST request in order to handle large files #161

Conversation

adam-lebon commented Nov 10, 2024 • edited Loading

The bug

The cause

The fix

ColdHeat commented Nov 26, 2024

adam-lebon commented Nov 26, 2024 • edited Loading

ColdHeat commented Nov 26, 2024

adam-lebon commented Nov 26, 2024 • edited Loading

pl4nty commented Nov 30, 2024

adam-lebon commented Nov 10, 2024 •

edited

Loading

adam-lebon commented Nov 26, 2024 •

edited

Loading

adam-lebon commented Nov 26, 2024 •

edited

Loading