Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring for keep-test #3353

Merged
merged 22 commits into from
Oct 10, 2022
Merged

Monitoring for keep-test #3353

merged 22 commits into from
Oct 10, 2022

Conversation

nkuba
Copy link
Member

@nkuba nkuba commented Oct 10, 2022

This PR implements a basic monitoring setup for keep-test environment.

We set up Prometheus and Grafana to monitor Keep Nodes running on test environment.

For details please see README.adoc.

Please note that we introduce the usage of kustomization tool. The tool is super helpful for config map generation and consistent resource labeling.

When exploring Prometheus configuration we touched Kubernetes cluster monitoring but decided to remove it from this PR, as the solution is far from the final. We should revisit the code removed in 2a78f10 in the future.

Refs #3012
Refs #3279
Refs #3149

Konrad Janicki and others added 22 commits September 22, 2022 20:33
 - added deployment for prometheus
 - added deployment for grafana
 - added storageclass to insert persistent storage using cloud provider
 - prepared PersistentVolumeClaim both for prometheus and grafana
 - prepared basic service using NodePort to use with Ingress
 - preaparing README file with instructions
 - reorganized conf for prometheus.yaml to have working discovery
 - added jobs for grana and prometheus because grafana has default dashes for grafana monitoring
 - modified data of configmap for grafana to add additional settins like dashboards conf and json files for creating those dashboards
 - added securityContext for grafana deployment
 - adjusted `volumeMounts` and `volumes` to have grafana config working with dashboards already created
 - used `items` to have one configmap and be able to choose wich file added dynamically should be put where
 - made some cosmetic name changes
 - renamed configmap and splitted into two files for prom a& grafana
 - created kustomization.yaml file for further config templates
 - removed `NodePort`
 - renamed some objects according to last review
 - prepared basic README file describing whole process how set everythings up
 - adjusted client-dashboard.json file by adding local keep-test bootstrap services & external one fro azure
 - changed `job=clients` in search path for client-dashboard to `type=client`
This is the current configuration running in the cluster.
By default Prometheus stores data from the last 15 days. We need to
increase that to 1 year.

See https://prometheus.io/docs/prometheus/latest/storage/
Kustomization let's us define config files as standalone files and then
combine them into a config map.
The configuration in this commit was copied from a tutorial to play with
prometheus. We could use some parts of it but we don't need it right
now.

We should revisit the code removed in this commit when we will be
working on supporting more kubernetes metrics monitoring.
@nkuba nkuba changed the title 3012 prometheus monitoring Monitoring for keep-test Oct 10, 2022
@nkuba nkuba requested a review from pdyraga October 10, 2022 13:59
@nkuba nkuba self-assigned this Oct 10, 2022
@nkuba nkuba added this to the v2.0.0-m2 milestone Oct 10, 2022
#### keep-discovered-nodes

The nodes to monitor are discovered with
link:https://github.com/keep-network/prometheus-sd[Prometheus Custom Service Discovery].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link to a specific branch? That repo is empty currently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be empty once we merge keep-network/prometheus-sd#1

@pdyraga pdyraga merged commit 3cead8d into main Oct 10, 2022
@pdyraga pdyraga deleted the 3012-prometheus-monitoring branch October 10, 2022 15:49
pdyraga added a commit that referenced this pull request Oct 25, 2022
In this PR we add a configuration of monitoring for the production
environment.

The configuration is based on a couple of PRs deployed and tested on the
keep-test environment:
#3353
#3357
#3373

The following resources are exposed:
Public Dashboard: https://public.monitoring.threshold.network
Grafana: https://monitoring.threshold.network/grafana
Prometheus: https://monitoring.threshold.network/prometheus
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants