feat: optimize keys for S3 performance #105

mikeal · 2020-06-29T20:35:37Z

When I was doing the Dumbo Drop project I hit most of the performance bottlenecks you can find in S3 and Dynamo. One thing I stumbled upon was a much better pattern for storing IPLD blocks in S3.

From the aws documentation.

your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket

This statement isn’t 100% honest. Most of the time you will not truly see this performance against every prefix, but it’s a good window into how S3 is architected and what the performance constraints are.

We’re in a very lucky situation, we can really optimize for this because every block already has a randomized key you can use as a prefix. I’ve recently built two block storage backends for IPLD and in both cases used the CID as a prefix rather than the final key, so something like {cid.toString()}/data and the performance I was able to get was tremendous.

If you really hammer a bucket with writes this way, you’ll see moments in which it’s re-balancing in order to get more throughput. Once I had a few billion blocks in a single bucket I aimed 2000+ lambda functions at the same bucket writing 1MB blocks and Lambda started having issues before I could saturate the bucket which was reliably doing about 40GB/s write throughput.

This library, and any other IPFS/IPLD storage backends for S3, should probably take the same approach.

The text was updated successfully, but these errors were encountered:

welcome · 2020-06-29T20:35:38Z

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

"Priority" labels will show how urgent this is for the team.
"Status" labels will show if this is ready to be worked on, blocked, or in progress.
"Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

mikeal added the need/triage Needs initial labeling and prioritization label Jun 29, 2020

jacobheun added status/ready Ready to be worked and removed need/triage Needs initial labeling and prioritization labels Jul 17, 2020

obo20 mentioned this issue Aug 17, 2021

Update go-ds-s3 plugin to utilize prefixes for massive rate limit improvements #195

Closed

Stebalien mentioned this issue Nov 15, 2021

Add key transform functions for massive rate limit improvements #205

Open

koxon mentioned this issue Dec 4, 2022

can one IPFS instance uses multiple s3 buckets #240

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: optimize keys for S3 performance #105

feat: optimize keys for S3 performance #105

mikeal commented Jun 29, 2020

welcome bot commented Jun 29, 2020

feat: optimize keys for S3 performance #105

feat: optimize keys for S3 performance #105

Comments

mikeal commented Jun 29, 2020

welcome bot commented Jun 29, 2020