feat(utils): use multiple sources to download snark artifacts #65

sripwoud · 2024-04-26T07:50:11Z

I don't really know why we see bad gateway errors in the ci when attempting to fetch the artifacts.
https://github.com/privacy-scaling-explorations/zk-kit/actions/runs/8837164770/job/24265411154#step:9:666

As an improvement (hopefully fix), I am thinking about a retry mechanism:

use [[unpkgCdn, jsdelivrCdn], github] urls list
randomly sort the cdns part
try the next one if the 1st cdn tried failed
repeat
the ultimate fallback would be the snark artifacts github repo raw url. We can leverage the release tags urls, ex: https://github.com/privacy-scaling-explorations/snark-artifacts/tree/@zk-kit/[email protected]/packages/eddsa
Luckily the url structure on github also matches what unpkg and jsdelivr use.

The random sorting would happen on each download attempts (whether it is for a a different proof, or different artifact for a same proof...), which should protect us against throttling (if that is the root cause of the bad gateway issues...)

The text was updated successfully, but these errors were encountered:

sripwoud · 2024-04-26T09:24:48Z

Benchmark results
#23

artwyman · 2024-05-14T21:39:37Z

I'm following your work here closely, because I'm working on something similar in the Zupass repo. We're going to have some slightly different concerns and trade-offs, but I think a similar artifact-distribution scheme will work for us. I'm most focused on artifact download in-browser, while for Node servers and utests we're more likely to use local copying, or NPM devDependencies.

A heads-up on something I learned in my own testing, in case you haven't tested your solution in-browser yet. GitHub's raw links behave differently in a browser depending on whether you're going via github.com or githubusercontent.com, which set a different Access-Control-Allow-Origin header. Below are two equivalent URLs for one of my artifacts I'm testing. My tests are for files in the repo itself, rather than release packages, but I suspect the behavior will be similar:

https://github.com/proofcarryingdata/snark-artifacts/raw/artwyman/experimental/packages/proto-pod-gpc/proto-pod-gpc_1o-1e-5md-1x10l-2x2t-vkey.json
https://raw.githubusercontent.com/proofcarryingdata/snark-artifacts/artwyman/experimental/packages/proto-pod-gpc/proto-pod-gpc_1o-1e-5md-1x10l-2x2t-vkey.json

The first URL redirects to the second, but only the second one can be successfully accessed in a browser (outside of a github.com page).

The minimal documentation about raw downloads from GitHub makes me nervous about relying on it too heavily, as it could have rate-limits which aren't published. It seems it's widely used in the community, and it is mentioned (in passing) as "always available" in this doc article https://docs.github.com/en/repositories/creating-and-managing-repositories/repository-limits
which is about limits, but doesn't say anything about download bandwidth limiting or rate limiting.

cedoor · 2024-05-15T09:59:44Z

Thank you very much @artwyman 🙏🏽

sripwoud · 2024-05-16T07:38:37Z

@artwyman I was waiting for privacy-scaling-explorations/zk-kit#275 to merge before getting back to working on this.
Thanks for sharing what you experienced with github URLs!

in case you haven't tested your solution in-browser yet

I hadn't tested dls in browser environments at all actually. I ll try to do some tests myself too.
I had just used a shell script to test/benchmark the downloads:
https://github.com/privacy-scaling-explorations/snark-artifacts/blob/main/scripts/bin/benchmark
#23

about raw downloads from GitHub makes me nervous about relying on it too heavily

If GitHub is 1 of 3 sources, probably that's ok in the beginning?
We would consider this an initial solution.

I am starting to wonder whether we should not move all this logic server side later (so more an issue to tackle in snark-artifacts):

Have our own delivery service like a cloudflare worker (or equivalent like amazon lambda@edge...).
The logic to try to fetch from several sources would be implemented server side in that worker

The benefits would be:

Simplify client side code: fetching logic is implemented once in the worker
Better performance and less requests: because of more efficient caching and fancy features like requests collapsing
More scalable

Wouldn't this be a more robust mitigation in the long term against the rate limiting challenge we are facing when attempting to download those relatively big artifacts files.

artwyman · 2024-05-16T07:51:17Z

Would your theoretical server-side solution need to "download" at all, or just be deployed with artifacts pre-installed? That's likely what I'll end up doing with the Zupass server, with the artifacts being delivered to the build+deploy system as an NPM package (a devDependency). Of course, that only allows serving up a single version, which is what the Zupass web client will need.

If you need to support multiple versions and don't want to store them all on your own server the answer might be more complicated. It's the separately-versioned cases for 3rd-party developers, and also for Zupass development during experimental phases where I'm anticipating using unpkg or GitHub, so I'm hopeful the usage rate won't lead to throttling in the near term.

But I know our requirements and trade-offs are different, so I'm sharing all this as useful context, not to say it's what you need to do as well. I'll keep an eye on what I can learn from your experience going forward too.

sripwoud · 2024-05-16T08:08:00Z

Would your theoretical server-side solution need to "download" at all, or just be deployed with artifacts pre-installed?

I am still thinking about it ^^. There are indeed different solutions and trade offs I guess.

I invite you to continue discussing there if you get further ideas/thoughts as you've been sharing valuable inputs 🙏
#38

support multiple versions

yes that's a rather strong requirement we want to fulfill.

sripwoud · 2024-06-21T11:37:34Z

@artwyman Just pinging you directly to bring your attention on the latest change we've done.
Instead of multiple public CDNs, we now use a single private CDN (s3 bucket + cloudfront) to serve all our artifacts.
See #91

Which has improved the download speed significantly
#96 (comment)

artwyman · 2024-06-21T20:57:43Z

@artwyman Just pinging you directly to bring your attention on the latest change we've done. Instead of multiple public CDNs, we now use a single private CDN (s3 bucket + cloudfront) to serve all our artifacts. See #91

Which has improved the download speed significantly #96 (comment)

Thanks for the heads-up. We may have to face the same issue in Zupass, though we haven't yet. We don't much usage of our GPC artifacts yet, and the usage inside Zupass is downloaded directly from the Zupass server, so the 3rd-party CDNs will only be stressed when we get more 3rd-party devs building integrations.

I have two follow-up questions to understand what you did here and how we might need to follow it:

Was the decision to move to S3+Cloudfront purely about performance, or was it driven by the failures mentioned in this issue? Did you ever get a better understanding of the cause of those, or did the retries resolve them?
Is there documentation anywhere of the specifics of your approach to the hosting side? I see the upload code in the repo. I haven't used S3+Cloudfront before, though I can figure it out when the time comes.

sripwoud · 2024-06-26T09:51:09Z

Was the decision to move to S3+Cloudfront purely about performance, or was it driven by the failures mentioned in this issue? Did you ever get a better understanding of the cause of those, or did the retries resolve them?

both because of performance and download failures.
Download failures were because the CDN we were using (esp unpkg) were throttling requests.
Speed was rather bad too.

Is there documentation anywhere of the specifics of your approach to the hosting side? I see the upload code in the repo. I haven't used S3+Cloudfront before, though I can figure it out when the time comes.

Would you be interested in a IaC implementation to make that config more transparent (I am thinking terraform or pulumi). ~~We don't use it yet. But we might.~~ What do you think @ntampakas ?

ntampakas · 2024-06-26T09:55:34Z

Hey @sripwoud ,
All AWS/Infra implementations are done using IaC, more specific in Terraform/Ansible. Currently, the code is in CodeCommit, but the DevOps team plans to migrate/transfer the code to GitHub.

artwyman · 2024-06-26T21:25:31Z

both because of performance and download failures. Download failures were because the CDN we were using (esp unpkg) were throttling requests. Speed was rather bad too.

Do they have published limits or was it all experimentally discovered? I'll need to decide at some point whether to prioritize this for non-Zupass users of our artifacts.

All AWS/Infra implementations are done using IaC, more specific in Terraform/Ansible. Currently, the code is in CodeCommit, but the DevOps team plans to migrate/transfer the code to GitHub.

This sounds great, thanks. I'll watch for this stuff to show up in GitHub (I assume in this repo?). It may be a while before we need something similar for Zupass-related artifacts, but when that time comes it'll be helpful to see what you did here. All my personal experience with cloud-hosting technologies was in the context of internal tools in a big company, but I'm sure I can figure out the AWS tools when I have to.

sripwoud · 2024-06-27T09:37:40Z

Do they have published limits or was it all experimentally discovered? I'll need to decide at some point whether to prioritize this for non-Zupass users of our artifacts.

experimentally discovered, here is the recurring error we were seeing. I haven't found any docs (haven't searched very long) or detailed info about throttling in https://github.com/mjackson/unpkg/issues. But I assumed this was the reason as this error was only happening when trying to perform requests in parallel in the tests (I haven't seen errors when fetching sequentially).

I assume in this repo?

yes, if otherwise we ll keep you informed

This was referenced Apr 26, 2024

feat(utils): use multiple sources to download snark artifacts privacy-scaling-explorations/zk-kit#273

Closed

refactor(utils)!: make snark-artifacts utility functions dynamic privacy-scaling-explorations/zk-kit#275

Merged

sripwoud self-assigned this Apr 26, 2024

sripwoud mentioned this issue May 16, 2024

Implement fetch/retry from multiple sources server side? #38

Closed

sripwoud transferred this issue from privacy-scaling-explorations/zk-kit May 26, 2024

sripwoud added this to SNARK Artifacts May 26, 2024

sripwoud mentioned this issue May 26, 2024

feat: try fetching different artifacts source urls #66

Merged

sripwoud closed this as completed in #66 May 27, 2024

github-project-automation bot moved this from 🏗 In Progress to ✔️ Done in ZK-Kit May 27, 2024

github-project-automation bot moved this to ✔️ Done in SNARK Artifacts May 27, 2024

sripwoud mentioned this issue Jun 26, 2024

Automate updates of https://snark-artifacts/pse.dev/project/latest endpoint when new versions are released #95

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(utils): use multiple sources to download snark artifacts #65

feat(utils): use multiple sources to download snark artifacts #65

sripwoud commented Apr 26, 2024 •

edited

Loading

sripwoud commented Apr 26, 2024

artwyman commented May 14, 2024

cedoor commented May 15, 2024

sripwoud commented May 16, 2024 •

edited

Loading

artwyman commented May 16, 2024

sripwoud commented May 16, 2024

sripwoud commented Jun 21, 2024 •

edited

Loading

artwyman commented Jun 21, 2024

sripwoud commented Jun 26, 2024 •

edited

Loading

ntampakas commented Jun 26, 2024 •

edited

Loading

artwyman commented Jun 26, 2024 •

edited

Loading

sripwoud commented Jun 27, 2024

feat(utils): use multiple sources to download snark artifacts #65

feat(utils): use multiple sources to download snark artifacts #65

Comments

sripwoud commented Apr 26, 2024 • edited Loading

sripwoud commented Apr 26, 2024

artwyman commented May 14, 2024

cedoor commented May 15, 2024

sripwoud commented May 16, 2024 • edited Loading

artwyman commented May 16, 2024

sripwoud commented May 16, 2024

sripwoud commented Jun 21, 2024 • edited Loading

artwyman commented Jun 21, 2024

sripwoud commented Jun 26, 2024 • edited Loading

ntampakas commented Jun 26, 2024 • edited Loading

artwyman commented Jun 26, 2024 • edited Loading

sripwoud commented Jun 27, 2024

sripwoud commented Apr 26, 2024 •

edited

Loading

sripwoud commented May 16, 2024 •

edited

Loading

sripwoud commented Jun 21, 2024 •

edited

Loading

sripwoud commented Jun 26, 2024 •

edited

Loading

ntampakas commented Jun 26, 2024 •

edited

Loading

artwyman commented Jun 26, 2024 •

edited

Loading