-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 artifact download performance improvement #9650
Comments
Just got pointed to this issue from a Slack thread. Sorry it never got a response from a maintainer in the past 2 years (it predates me). As Tim wrote on Slack: And as I wrote in addition to Tim's comment: You can set executor resources globally in your Controller ConfigMap under You can set them per template by using
Regarding
There is a separate, specific issue to parallelize artifacts: #12442 Going to mark this issue as duplicative of those as they are more specific in their feature request. |
Summary
We try to use artifact to download a fairly big s3 bucket and found the performance is not satisfactory comparing with aws s3 cli.
The bucket size is about 60G total, containing subfolders and many files.
with argo, it took around 26 mins
with aws s3 sync, it took around 3 mins. (on same ec2 machine)
What change needs making?
improve the performance with async? or multi processing? or allow user to plugin download functionality?
Use Cases
We have an aggregation step in time sensitive pipeline, which needs to download large data and aggregate.
Message from the maintainers:
Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.
The text was updated successfully, but these errors were encountered: