-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@s3/store: add backpressure + download incomplete part first #561
@s3/store: add backpressure + download incomplete part first #561
Conversation
7f85eed
to
cfebba6
Compare
1f376a2
to
e51ab1a
Compare
proxy.destroy(err) | ||
// we end the stream gracefully here so that we can upload the remaining bytes to the store | ||
// as an incompletePart | ||
proxy.end() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very small but nice change here.
When an upload request is canceled, Instead of calling proxy.destroy()
we can call proxy.end()
to gracefully close the passthrough stream, so that we can flush the bytes of a chunk as an incomplete part and upload it to S3.
This will result in starting exactly where we left off as opposed to starting from the last successful part uploaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
e51ab1a
to
8d805d4
Compare
33c82d5
to
138c7c4
Compare
138c7c4
to
0e8006e
Compare
0e8006e
to
6235da5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A great addition!
Might need to go over it one more time but some initial feedback here.
7ccd14f
to
8a1d6c7
Compare
@Murderlon I've updated the Readme and added a detailed documentation on the I've also increased the value to 60 instead of 30 (~480MB), as this aligns better with tusd default. Let me know if there is anything else, this change will be awesome to have in |
…upabase/tus-node-server into @s3-store/stream-handling-improvements
79fc3f6
to
6499867
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👌
I think I would like a quick review from @Acconut on this before merging.
Co-authored-by: Merlijn Vos <[email protected]>
Co-authored-by: Merlijn Vos <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work, @fenos!
@@ -91,6 +91,22 @@ you need to provide a cache implementation that is shared between all instances | |||
|
|||
See the exported [KV stores][kvstores] from `@tus/server` for more information. | |||
|
|||
#### `options.maxConcurrentPartUploads` | |||
|
|||
This setting determines the maximum number of simultaneous part uploads to an S3 storage service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to state in this document what a part actually is. Maybe a brief explanation saying that this module uses S3 multipart uploads, which splits a file into multiple equally-sized parts (independent of the PATCH requests). With this context, it's easier to understand what concurrent part uploads actually means.
packages/s3-store/README.md
Outdated
#### `options.maxConcurrentPartUploads` | ||
|
||
This setting determines the maximum number of simultaneous part uploads to an S3 storage service. | ||
The default value is 60. This default is chosen in conjunction with the typical partSize of 8MiB, aiming for an effective transfer rate of approximately 480MiB/s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation seems assume that one part is uploaded in 1s. Might be worth mentioning. Also transfer speeds are commonly given in Mbit/s not MiB/s. That would be 3.84 Gbit/s, which seems very high. I am not sure if the transfer speeds to S3 reach that level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Traffic between Amazon EC2 and Amazon S3 uses up to 100 Gbps of bandwidth to Amazon Virtual Private Cloud (Amazon VPC) endpoints and public IP addresses in the same AWS Region.
According to google is 12.5 Gbit/s
proxy.destroy(err) | ||
// we end the stream gracefully here so that we can upload the remaining bytes to the store | ||
// as an incompletePart | ||
proxy.end() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Regarding replacing internal code with dependencies, I would wait for a comment from @Murderlon. Maybe his position is different than mine :) |
I'd personally would avoid adding a dependency if it's not much code, we likely rarely/never have to change it, and we don't foresee complex side-effects. So I don't think we need a package to concat streams or unique tmp files. For the semaphore I don't have a strong opinion, in this case it's a well tested copy paste from Shopify so from a trust perspective I'd say it's equal (or higher) to a package. But if there is a similar well tested package with a small footprint I would be okay with that too. I'm okay with either. |
Alright, if you prefer less dependencies, we can go with that route as well. That's probably a topic where personal preferences differ. |
I just saw that |
New dependencies detected. Learn more about Socket for GitHub ↗︎
|
Hello @Murderlon @Acconut Sorry for the delay on this, has been very busy last week.
Let me know if there is anything else |
7073828
to
9306d5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good 👍 One Q
…upabase/tus-node-server into @s3-store/stream-handling-improvements
9306d5f
to
0f8098e
Compare
0f8098e
to
2aacf2e
Compare
@Murderlon all ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This PR improves the handling of incomplete parts and adds backpressure by limiting the number of concurrent uploads/files on the disk that are being created.
Incomplete Parts
The incomplete part logic was somewhat not always working as intended.
An issue I came across with the old implementation is that when i upload a file using chunked upload from the client, and the chunk is bigger than the
preferred
size on the server, at times the offset returned wasn't calculated properly resulting in the subsequent request to return 409 (Offset-Conflict) very much related to #501This PR fixes this issue, by always downloading the incomplete part first and prepending it to the readstream.
By using this approach the offset is now calculated properly and also fixes #501 (see test)
Backpressure
Added support for backpressure. The way it works is by using a semaphore on the StreamSplitter.
Only X number of chunks on disk can be created at any one time (default 30) and the semaphore is released upon finishing the part upload, meaning that the backpressure is handled by how fast S3 can upload parts.
In the scenario when the limit of concurrent chunks has been reached, the default node mechanism of backpressure will kick in. This means that when the highwater mark on the readstream buffer is full, the stream will be paused until the flow of data continues and the semaphore has capacity.
This aligns with the issue #505