Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker push to Pulp registry gives 429 #1716

Closed
grzleadams opened this issue Jul 24, 2024 · 4 comments
Closed

Docker push to Pulp registry gives 429 #1716

grzleadams opened this issue Jul 24, 2024 · 4 comments

Comments

@grzleadams
Copy link

Version
Deployed via Pulp Operator v1.0.0-beta.4 on K8s 1.26.

$ pulp status           
{                                                                                                        
  "versions": [                                                                                          
    {                                                                                                    
      "component": "core",                                                                               
      "version": "3.49.1",                                                                               
      "package": "pulpcore",                                                                             
      "module": "pulpcore.app",                                                                          
      "domain_compatible": true                                                                          
    },                                                                                                   
    <snip>                                                                                           
    {                                                                                                    
      "component": "container",                                                                          
      "version": "2.19.2",                                                                               
      "package": "pulp-container",               
      "module": "pulp_container.app",                                                                    
      "domain_compatible": false                                                                         
    },                                                                                                   
    <snip>
  ]
  <snip>

Describe the bug
We occasionally see docker pushes fail with 429 Too Many Requests from the API pod.
GitHub Actions logs:

2024-07-24T17:56:40.7507000Z pulp/pulpcore#16 ERROR: failed to push pulp.<domain>/<image><tag>: failed commit on ref "index-sha256:<hash>": unexpected status from PUT request to https://pulp.<domain>/v2/<image>/manifests/<tag>: 429 Too Many Requests

Pulp API logs:

Wed, Jul 24 2024 1:56:40pm  ('pulp [3017ebedd41b40048661c145c580aded]: ::ffff:10.42.151.114 - - [24/Jul/2024:17:56:40 +0000] "PUT /v2/<image>/manifests/<tag> HTTP/1.1" 429 82 "-" "buildkit/v0.15"',)

We currently have 10 API pods, 5 content pods, and 10 worker pods, and our throughput on all of them doesn't seem particularly large. I understand it's Django doing this, but I don't see any settings to raise the thresholds for 429s in Django docs. What's the best way to approach this? Do we make the API pods larger, or add more of them (and if we do, do we risk bottlenecking on the PostgreSQL DB or something else)?

To Reproduce
Not sure, probably deploy a Pulp instance and hammer it with docker pushes?

Expected behavior
The Docker push should succeed.

Additional context
N/A

@lubosmj
Copy link
Member

lubosmj commented Jul 25, 2024

@git-hyagi, can you take a look at this in case you are available?

@lubosmj
Copy link
Member

lubosmj commented Jul 30, 2024

We tend to return 429 in cases where background tasks fail to commit changes to a repository:

Can you please verify you are not seeing any add_and_remove "canceled" tasks in your environment?

http GET /pulp/api/v3/tasks/?state=canceled

@lubosmj
Copy link
Member

lubosmj commented Jul 30, 2024

I have noticed this comment on your related issue: pulp/pulp-operator#1308.

Does it mean that scaling down the number of pods helped you to resolve the problem? It looks like this is more related to the way how you deploy Pulp. I am not sure we have any best-practices recommendations for larger deployments.

@grzleadams
Copy link
Author

I have noticed this comment on your related issue: pulp/pulp-operator#1308 (comment).

Does it mean that scaling down the number of pods helped you to resolve the problem? It looks like this is more related to the way how you deploy Pulp. I am not sure we have any best-practices recommendations for larger deployments.

Yeah, scaling down the number of pods (but increasing the number of gunicorn workers) seems to have mitigated the problems we were seeing (which is kind of strange because it's the same number of worker processes, but maybe Pulp treats it differently somehow).

I guess the thing is that 429's are a perfectly acceptable way to handle excessive traffic, and the real problem is that the docker/build-push-action doesn't have retry capability, unlike the Docker CLI. The registry telling clients to relax seems like a reasonable thing to me and this issue was more about finding ways to scale our deployment correctly. So I think Pulp is probably doing an okay thing here, and we can close this issue in favor of providing some kind of guidance around deployment scalability (as in pulp/pulp-operator#1308).

@github-project-automation github-project-automation bot moved this from Not Started to Done in Pulp Container Roadmap Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants