Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@s3/store: add backpressure + download incomplete part first #561

Merged
merged 11 commits into from
Feb 5, 2024

Conversation

fenos
Copy link
Collaborator

@fenos fenos commented Jan 19, 2024

This PR improves the handling of incomplete parts and adds backpressure by limiting the number of concurrent uploads/files on the disk that are being created.

Incomplete Parts

The incomplete part logic was somewhat not always working as intended.

An issue I came across with the old implementation is that when i upload a file using chunked upload from the client, and the chunk is bigger than the preferred size on the server, at times the offset returned wasn't calculated properly resulting in the subsequent request to return 409 (Offset-Conflict) very much related to #501

This PR fixes this issue, by always downloading the incomplete part first and prepending it to the readstream.
By using this approach the offset is now calculated properly and also fixes #501 (see test)

Backpressure

Added support for backpressure. The way it works is by using a semaphore on the StreamSplitter.
Only X number of chunks on disk can be created at any one time (default 30) and the semaphore is released upon finishing the part upload, meaning that the backpressure is handled by how fast S3 can upload parts.

In the scenario when the limit of concurrent chunks has been reached, the default node mechanism of backpressure will kick in. This means that when the highwater mark on the readstream buffer is full, the stream will be paused until the flow of data continues and the semaphore has capacity.

This aligns with the issue #505

@fenos fenos force-pushed the @s3-store/stream-handling-improvements branch from 7f85eed to cfebba6 Compare January 19, 2024 10:43
@fenos fenos requested a review from Murderlon January 19, 2024 10:43
@fenos fenos force-pushed the @s3-store/stream-handling-improvements branch 9 times, most recently from 1f376a2 to e51ab1a Compare January 20, 2024 11:36
proxy.destroy(err)
// we end the stream gracefully here so that we can upload the remaining bytes to the store
// as an incompletePart
proxy.end()
Copy link
Collaborator Author

@fenos fenos Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very small but nice change here.
When an upload request is canceled, Instead of calling proxy.destroy() we can call proxy.end() to gracefully close the passthrough stream, so that we can flush the bytes of a chunk as an incomplete part and upload it to S3.

This will result in starting exactly where we left off as opposed to starting from the last successful part uploaded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@fenos fenos force-pushed the @s3-store/stream-handling-improvements branch from e51ab1a to 8d805d4 Compare January 20, 2024 11:47
@fenos fenos force-pushed the @s3-store/stream-handling-improvements branch 4 times, most recently from 33c82d5 to 138c7c4 Compare January 20, 2024 13:38
@fenos fenos requested a review from Acconut January 20, 2024 13:39
@fenos fenos force-pushed the @s3-store/stream-handling-improvements branch from 138c7c4 to 0e8006e Compare January 20, 2024 13:48
@fenos fenos force-pushed the @s3-store/stream-handling-improvements branch from 0e8006e to 6235da5 Compare January 22, 2024 11:23
Copy link
Member

@Murderlon Murderlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A great addition!

Might need to go over it one more time but some initial feedback here.

packages/server/src/models/StreamSplitter.ts Outdated Show resolved Hide resolved
packages/s3-store/index.ts Show resolved Hide resolved
packages/s3-store/index.ts Outdated Show resolved Hide resolved
@fenos fenos force-pushed the @s3-store/stream-handling-improvements branch from 7ccd14f to 8a1d6c7 Compare January 22, 2024 15:57
@fenos
Copy link
Collaborator Author

fenos commented Jan 24, 2024

@Murderlon I've updated the Readme and added a detailed documentation on the maxConcurrentPartUploads flag.

I've also increased the value to 60 instead of 30 (~480MB), as this aligns better with tusd default.
Tusd uses 50MB partSize and 10 maxConcurrentPartUploads totaling at 500MB.

Let me know if there is anything else, this change will be awesome to have in

…upabase/tus-node-server into @s3-store/stream-handling-improvements
@fenos fenos force-pushed the @s3-store/stream-handling-improvements branch from 79fc3f6 to 6499867 Compare January 24, 2024 11:50
Copy link
Member

@Murderlon Murderlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👌

I think I would like a quick review from @Acconut on this before merging.

packages/s3-store/README.md Outdated Show resolved Hide resolved
packages/s3-store/README.md Outdated Show resolved Hide resolved
fenos and others added 2 commits January 24, 2024 12:02
Copy link
Member

@Acconut Acconut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work, @fenos!

@@ -91,6 +91,22 @@ you need to provide a cache implementation that is shared between all instances

See the exported [KV stores][kvstores] from `@tus/server` for more information.

#### `options.maxConcurrentPartUploads`

This setting determines the maximum number of simultaneous part uploads to an S3 storage service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to state in this document what a part actually is. Maybe a brief explanation saying that this module uses S3 multipart uploads, which splits a file into multiple equally-sized parts (independent of the PATCH requests). With this context, it's easier to understand what concurrent part uploads actually means.

#### `options.maxConcurrentPartUploads`

This setting determines the maximum number of simultaneous part uploads to an S3 storage service.
The default value is 60. This default is chosen in conjunction with the typical partSize of 8MiB, aiming for an effective transfer rate of approximately 480MiB/s.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation seems assume that one part is uploaded in 1s. Might be worth mentioning. Also transfer speeds are commonly given in Mbit/s not MiB/s. That would be 3.84 Gbit/s, which seems very high. I am not sure if the transfer speeds to S3 reach that level.

Copy link
Collaborator Author

@fenos fenos Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traffic between Amazon EC2 and Amazon S3 uses up to 100 Gbps of bandwidth to Amazon Virtual Private Cloud (Amazon VPC) endpoints and public IP addresses in the same AWS Region.

According to google is 12.5 Gbit/s

packages/s3-store/index.ts Outdated Show resolved Hide resolved
packages/s3-store/index.ts Outdated Show resolved Hide resolved
packages/s3-store/index.ts Show resolved Hide resolved
packages/s3-store/index.ts Show resolved Hide resolved
proxy.destroy(err)
// we end the stream gracefully here so that we can upload the remaining bytes to the store
// as an incompletePart
proxy.end()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

packages/server/src/models/Semaphore.ts Outdated Show resolved Hide resolved
packages/server/src/models/StreamSplitter.ts Outdated Show resolved Hide resolved
packages/server/src/models/StreamSplitter.ts Outdated Show resolved Hide resolved
@Acconut
Copy link
Member

Acconut commented Jan 26, 2024

Regarding replacing internal code with dependencies, I would wait for a comment from @Murderlon. Maybe his position is different than mine :)

@Murderlon
Copy link
Member

Murderlon commented Jan 26, 2024

Regarding replacing internal code with dependencies, I would wait for a comment from @Murderlon. Maybe his position is different than mine :)

I'd personally would avoid adding a dependency if it's not much code, we likely rarely/never have to change it, and we don't foresee complex side-effects. So I don't think we need a package to concat streams or unique tmp files. For the semaphore I don't have a strong opinion, in this case it's a well tested copy paste from Shopify so from a trust perspective I'd say it's equal (or higher) to a package. But if there is a similar well tested package with a small footprint I would be okay with that too. I'm okay with either.

@Acconut
Copy link
Member

Acconut commented Jan 29, 2024

Alright, if you prefer less dependencies, we can go with that route as well. That's probably a topic where personal preferences differ.

@Murderlon
Copy link
Member

I just saw that @shopify/semaphore is available as a package. In that case I agree with @Acconut that a dependency is better.

Copy link

socket-security bot commented Feb 5, 2024

New dependencies detected. Learn more about Socket for GitHub ↗︎

Package New capabilities Transitives Size Publisher
npm/@shopify/[email protected] None 0 8.77 kB shopify-dep
npm/@types/[email protected] None +1 764 kB types
npm/[email protected] Transitive: environment +7 196 kB feross

View full report↗︎

@fenos
Copy link
Collaborator Author

fenos commented Feb 5, 2024

Hello @Murderlon @Acconut

Sorry for the delay on this, has been very busy last week.
I've made all the changes we discussed on this PR here is the breakdown of my latest commit:

  • installed @shopify/semaphore as a dependency.
  • installed multistream dependency to handle stream concatenation. The reasoning behind this is that my implementation might have missed some edge cases that I'm not aware of. This library has been battle-tested for years and it seems good to rely on this for concatenating streams.
  • Fixed calculation on documentation

Let me know if there is anything else

@fenos fenos force-pushed the @s3-store/stream-handling-improvements branch from 7073828 to 9306d5f Compare February 5, 2024 08:49
Copy link
Member

@Murderlon Murderlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good 👍 One Q

packages/s3-store/index.ts Outdated Show resolved Hide resolved
…upabase/tus-node-server into @s3-store/stream-handling-improvements
@fenos fenos force-pushed the @s3-store/stream-handling-improvements branch from 9306d5f to 0f8098e Compare February 5, 2024 11:19
@fenos fenos force-pushed the @s3-store/stream-handling-improvements branch from 0f8098e to 2aacf2e Compare February 5, 2024 11:31
@fenos
Copy link
Collaborator Author

fenos commented Feb 5, 2024

@Murderlon all ready

Copy link
Member

@Murderlon Murderlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Murderlon Murderlon merged commit a5a4cd3 into tus:main Feb 5, 2024
4 checks passed
Murderlon added a commit that referenced this pull request Feb 5, 2024
@fenos fenos deleted the @s3-store/stream-handling-improvements branch February 5, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3Store: Deferred length uploads fail when only 1 incomplete part is uploaded
3 participants