Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimize keys for S3 performance #105

Open
mikeal opened this issue Jun 29, 2020 · 1 comment
Open

feat: optimize keys for S3 performance #105

mikeal opened this issue Jun 29, 2020 · 1 comment
Labels
status/ready Ready to be worked

Comments

@mikeal
Copy link

mikeal commented Jun 29, 2020

When I was doing the Dumbo Drop project I hit most of the performance bottlenecks you can find in S3 and Dynamo. One thing I stumbled upon was a much better pattern for storing IPLD blocks in S3.

From the aws documentation.

your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket

This statement isn’t 100% honest. Most of the time you will not truly see this performance against every prefix, but it’s a good window into how S3 is architected and what the performance constraints are.

We’re in a very lucky situation, we can really optimize for this because every block already has a randomized key you can use as a prefix. I’ve recently built two block storage backends for IPLD and in both cases used the CID as a prefix rather than the final key, so something like {cid.toString()}/data and the performance I was able to get was tremendous.

If you really hammer a bucket with writes this way, you’ll see moments in which it’s re-balancing in order to get more throughput. Once I had a few billion blocks in a single bucket I aimed 2000+ lambda functions at the same bucket writing 1MB blocks and Lambda started having issues before I could saturate the bucket which was reliably doing about 40GB/s write throughput.

This library, and any other IPFS/IPLD storage backends for S3, should probably take the same approach.

@mikeal mikeal added the need/triage Needs initial labeling and prioritization label Jun 29, 2020
@welcome
Copy link

welcome bot commented Jun 29, 2020

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/ready Ready to be worked
Projects
None yet
Development

No branches or pull requests

2 participants