Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle incorrect results from DockerHub API #6

Open
mstein11 opened this issue Jan 18, 2021 · 7 comments
Open

How to handle incorrect results from DockerHub API #6

mstein11 opened this issue Jan 18, 2021 · 7 comments

Comments

@mstein11
Copy link
Contributor

mstein11 commented Jan 18, 2021

As discussed in #4, the DockerHub API sometimes falsely returns 100 (the maximum) available requests. Should we handle this value "as-is" or should we ignore the value if we know it is false?

Handling the value "as-is" leads to odd cliffs in the graph.
Odd cliffs indicating false return values
Ignoring the false values might help, but it would force us to ignore a valid return value from the API.

How should we go forward with this?

@mstein11 mstein11 changed the title How to handle incorrect result from dockerhub API How to handle incorrect results from DockerHub API Jan 18, 2021
@immanuelfodor
Copy link

Is there anyone at DockerHub we could notify about this isse? Maybe it's easier if they fix it server-side :)

@immanuelfodor
Copy link

@olegburov maybe? "working on hub.docker.com" seems promising, or at least you could notify the right person about this API response fluctuation.

Context: the DockerHub API sometimes returns 100 as remaining pull counter. Clearly visible at #4 (comment) when I check the limit in every minute with curl. Or on OP's Grafana graph above. It would be much easier if the incorrect response could be fixed on DockerHub's side.

@ob1dev
Copy link

ob1dev commented Jan 18, 2021

CC @binman-docker

@immanuelfodor
Copy link

@olegburov @binman-docker I'm experiencing more and more incorrect results in an increasing trend via the API. The phenomenon is clearly visible on the below screenshot, starting from 02.26. 01-02 AM and even more from 03.09. 17-18 PM.

image

The requests don't fail with an error code (everything is 200 OK) but there are no numbers in the RateLimit-Limit and RateLimit-Remaining headers:

IMAGE="ratelimitpreview/test"
TOKEN=$(curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:$IMAGE:pull" | jq -r .token)

# good response
$ curl -s --head -H "Authorization: Bearer $TOKEN" https://registry-1.docker.io/v2/$IMAGE/manifests/latest
HTTP/1.1 200 OK
Content-Length: 2782
Content-Type: application/vnd.docker.distribution.manifest.v1+prettyjws
Docker-Content-Digest: sha256:767a3815c34823b355bed31760d5fa3daca0aec2ce15b217c9cd83229e0e2020
Docker-Distribution-Api-Version: registry/2.0
Etag: "sha256:767a3815c34823b355bed31760d5fa3daca0aec2ce15b217c9cd83229e0e2020"
Date: Wed, 17 Mar 2021 15:54:16 GMT
Strict-Transport-Security: max-age=31536000
RateLimit-Limit: 100;w=21600
RateLimit-Remaining: 100;w=21600

# bad response
$ curl -s --head -H "Authorization: Bearer $TOKEN" https://registry-1.docker.io/v2/$IMAGE/manifests/latest
HTTP/1.1 200 OK
Content-Length: 2782
Content-Type: application/vnd.docker.distribution.manifest.v1+prettyjws
Docker-Content-Digest: sha256:767a3815c34823b355bed31760d5fa3daca0aec2ce15b217c9cd83229e0e2020
Docker-Distribution-Api-Version: registry/2.0
Etag: "sha256:767a3815c34823b355bed31760d5fa3daca0aec2ce15b217c9cd83229e0e2020"
Date: Wed, 17 Mar 2021 15:54:20 GMT
Strict-Transport-Security: max-age=31536000
RateLimit-Limit: ;w=21600
RateLimit-Remaining: ;w=21600

Since I had set up an alert, I was notified multiple times that the pull limit is below the threshold but it's not, just the API is not returning a response for more than 5 minutes when polled in every minute. I've modified the alert rule to check for a larger interval to avoid false positive alerts but I think there is a trend that the API is dealing with some unresolved problem internally.

@mstein11
Copy link
Contributor Author

#16 suggests to cache previous api responses and return those cached values in case no numbers are returned for RateLimit-Limit.

@immanuelfodor it seems to have the most experience with this issue here. What do you think about this? Would it make your alerting easier? Or would you rather leave the behavior as it is right now?

@immanuelfodor
Copy link

immanuelfodor commented Apr 23, 2021

After I increased the alerting time frame in my previous comment on 03.17, I got alerted again in the afternoon of 03.30 as there was almost no data returned for an extended period. Manually running curl resulted in data once in 10 or 20 runs in a for loop with 1s sleep time, so I ignored the alert and noted that DockerHub is having some major problem with its API.

image

image

However, on the afternoon of 04.01 something happened, and the API started to return results almost always. Then there were some minor glitches a few days after but since 04.05, there hasn't been any minor problem even. All of the missing data are server maintenance, reboots, container restarts, etc. past 04.05 that missed a check run at every minute.

image

image

I think they managed to fix it, and the API is solid now.

In case of a new similar problem, I think going with the previous result should be fine, my alerts are also keeping the last state if no data is present, current setup:

image

Note: for the Matrix notifications, I use this project: https://github.com/immanuelfodor/matrix-encrypted-webhooks

@mstein11 mstein11 reopened this Apr 23, 2021
@mstein11
Copy link
Contributor Author

@immanuelfodor Thanks for your quick reply and your insights! Lets go with the previous result then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants