Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Grafana Dashboard #4

Open
TJM opened this issue Dec 8, 2020 · 10 comments
Open

Add Grafana Dashboard #4

TJM opened this issue Dec 8, 2020 · 10 comments

Comments

@TJM
Copy link
Contributor

TJM commented Dec 8, 2020

The original blog post has a dashboard linked. I am thinking about adding that as an option to the helm chart. However, in order to "maintain" a reasonable gauge, I had to hardcode the min/max values. It doesn't look like there is a way to calculate or use the MAX based on a query, without changing over to "percent" (which you can hardcode).

I have a couple enhancements to mine that I would be happy to share as well (fixed thresholds on Pulls Remaining for example).

Also, has anyone else noticed that the graphs fall off like a cliff several times a day instead of having a "rolling" 6 hour period? Is that a bug in our collection or in "dockerhub" ? (thoughts?)

Screen Shot 2020-12-08 at 12 42 44 PM

@mstein11
Copy link
Contributor

mstein11 commented Dec 9, 2020

I like the idea of having the Dashboard as part of the chart too!
Mind opening a pull request so I can take a look? I think hardcoding the min/max values is suboptimal since it would require a change of the source code if docker decides to make a change to the quota. If you open a PR I'll check if we can find a way to use the RateLimit-Limit from the http response for the purpose of automatically deciding the max value.

@mstein11
Copy link
Contributor

mstein11 commented Dec 9, 2020

We made a similar observation, although we didn't encounter the "falling of a cliff" phenomenon quite as often as you do. My first though was that this phenomenon might be caused by a change in the IP address which is used to query the limit, but actually our system should have a static IP address and we did encounter the phenomenon too. So maybe it's just dockerhub?

Might also be that we have a bug in our script, but I think this is unlikely because it just takes the values from the http Header of the HEAD request and exposes those values. We don't do any calculations here.

Another interesting phenomenon is that the HEAD request sometimes does not return any information regarding the Rate Limit. This leads to a drop to zero followed by the correct data point in the graph. (Happened twice between 14:00 and 16:00 n your graph).
Bildschirmfoto 2020-12-09 um 12 36 06

@TJM
Copy link
Contributor Author

TJM commented Dec 9, 2020

I am guessing that the odd zeros are probably just a bug where dockerhub's code is timing out calculating/collecting, so its returning a default value (0)... It might be interesting to try to catch the debug output when an odd zero, maybe they are returning an error too, and we could catch/retry or just not "export" that odd value... We don't have to do anything, as we are simply reporting the value we received from docker, but it might make the graph smoother (if that is desirable) :)

For the "cliff" issue, I am guessing that they are pushing out a new version of their code that calculates image pulls and maybe don't have stateful storage configured?

@TJM
Copy link
Contributor Author

TJM commented Dec 9, 2020

I have not done the helmchart part yet, I am just releasing it with flux directly right now... which basically involves a yaml file like:

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    grafana_dashboard: "1"
  name: dockerhub-rate-limits
  namespace: monitoring
data:
  dockerhub-rate-limits.json: |-
    {JSON HERE}

I will attach the JSON directly (had to gzip it for github), so we can collaborate on the min/max... I don't see any issue with hardcoding the min to 0, but the max will have to be template driven (values.yaml) unless we can come up with a way to calculate it (without switching to percent)
Docker Hub Rate Limits-1607535713313.json.gz

@TJM
Copy link
Contributor Author

TJM commented Dec 10, 2020

strangely, I tried removing the MAX and it is fine now? I wonder if the problem was that I didn't have enough data yet to determine the auto-determined max?

@mstein11
Copy link
Contributor

I haven't found the time yet to look at the dashboard json, but it sounds reasonable that grafana can choose a fitting max value automatically? So I would suggest not specifying a max value and letting grafana work its magic?

Regarding the "odd zeros" I think there is no error code returned in the http request, the actual values for the metrics are just missing. At least it was like that when I was looking into it a couple of weeks ago. I am not sure if it's desirable to smoothen out the curve, I think it's better to simple reflect the truth as mandated by the dockerhub api and leaving the interpretation of the data to the user looking at the chart.

Regarding the "cliff issue" I think your assumption makes a lot of sense.

@TJM
Copy link
Contributor Author

TJM commented Dec 11, 2020

Hmm, just thought of something, it almost seems like they are returning 100 available (and 100 max) based on the results? From what I can tell in the code, the default values would be 0, 0. We could probably safely just not publish a 0,0 result, it would make more sense to keep the previous value or not return anything (leave a gap) than to return 0,0, but in this case, it is returning 100,100 (I think).

Anyhow... since we can make this dashboard template driven, I am thinking make setting the max value optional. That way, if someone wants to hard-code it, they can set it in values.yaml.

@immanuelfodor
Copy link

I experience the same behavior even with a simple bash script and curl:

Jan 17 12:38:02 influx-grafana dockerhub[12552]: DockerHub ratelimit: limit=100,remaining=66
Jan 17 12:38:04 influx-grafana dockerhub[12556]: Done.
Jan 17 12:39:03 influx-grafana dockerhub[12576]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:39:05 influx-grafana dockerhub[12580]: Done.
Jan 17 12:40:02 influx-grafana dockerhub[12643]: DockerHub ratelimit: limit=100,remaining=100
Jan 17 12:40:04 influx-grafana dockerhub[12658]: Done.
Jan 17 12:41:02 influx-grafana dockerhub[12741]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:41:04 influx-grafana dockerhub[12745]: Done.
Jan 17 12:42:02 influx-grafana dockerhub[12765]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:42:04 influx-grafana dockerhub[12769]: Done.
Jan 17 12:43:02 influx-grafana dockerhub[12789]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:43:04 influx-grafana dockerhub[12793]: Done.
Jan 17 12:44:03 influx-grafana dockerhub[12813]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:44:05 influx-grafana dockerhub[12817]: Done.
Jan 17 12:45:03 influx-grafana dockerhub[12837]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:45:05 influx-grafana dockerhub[12841]: Done.
Jan 17 12:46:02 influx-grafana dockerhub[12862]: DockerHub ratelimit: limit=100,remaining=100
Jan 17 12:46:04 influx-grafana dockerhub[12866]: Done.
Jan 17 12:47:03 influx-grafana dockerhub[12886]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:47:05 influx-grafana dockerhub[12890]: Done.
Jan 17 12:48:02 influx-grafana dockerhub[12910]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:48:04 influx-grafana dockerhub[12914]: Done.
Jan 17 12:49:02 influx-grafana dockerhub[12934]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:49:04 influx-grafana dockerhub[12938]: Done.
Jan 17 12:50:03 influx-grafana dockerhub[13008]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:50:05 influx-grafana dockerhub[13020]: Done.
Jan 17 12:51:02 influx-grafana dockerhub[13102]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:51:04 influx-grafana dockerhub[13106]: Done.
Jan 17 12:52:02 influx-grafana dockerhub[13126]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:52:04 influx-grafana dockerhub[13130]: Done.
Jan 17 12:53:03 influx-grafana dockerhub[13150]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:53:05 influx-grafana dockerhub[13154]: Done.
Jan 17 12:54:03 influx-grafana dockerhub[13174]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:54:05 influx-grafana dockerhub[13178]: Done.
Jan 17 12:55:03 influx-grafana dockerhub[13198]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:55:05 influx-grafana dockerhub[13202]: Done.
Jan 17 12:56:02 influx-grafana dockerhub[13223]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:56:04 influx-grafana dockerhub[13227]: Done.
Jan 17 12:57:02 influx-grafana dockerhub[13247]: DockerHub ratelimit: limit=100,remaining=100
Jan 17 12:57:04 influx-grafana dockerhub[13251]: Done.
Jan 17 12:58:02 influx-grafana dockerhub[13271]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:58:04 influx-grafana dockerhub[13275]: Done.
Jan 17 12:59:02 influx-grafana dockerhub[13295]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:59:04 influx-grafana dockerhub[13299]: Done.
Jan 17 13:00:03 influx-grafana dockerhub[13331]: DockerHub ratelimit: limit=100,remaining=69
Jan 17 13:00:05 influx-grafana dockerhub[13363]: Done.

@TJM
Copy link
Contributor Author

TJM commented Jan 17, 2021

For sure, the odd 100 reading seems like it is "their" problem, not ours. The question is, do we just report the statistics as gathered, or "filter" the odd result out?

HOWEVER, this issue was actually about adding the grafana dashboard ;)

@mstein11
Copy link
Contributor

I opened a separate issue for the "odd 100" problem. :-)

Is there something I can do to help go forward with the Grafana dashboard here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants