diff --git a/content/en/docs/monitoring/_index.md b/content/en/docs/monitoring/_index.md new file mode 100644 index 0000000..429b15c --- /dev/null +++ b/content/en/docs/monitoring/_index.md @@ -0,0 +1,17 @@ +--- +title: "Monitoring" +weight: 8 +labfoldernumber: "08" +sectionnumber: 8 +description: > + Monitoring KubeVirt Components and virtual machines. +--- + +In this section we will learn how to monitor the KubeVirt components and virtual machines. + + +## Lab Goals + +* Learn the different aspects of monitoring the KubeVirt components. +* Understand and explore the existing prometheus metrics. +* Integrate the metrics of a virtual machine into a prometheus stack. diff --git a/content/en/docs/monitoring/guest-agent.md b/content/en/docs/monitoring/guest-agent.md new file mode 100644 index 0000000..45b9a6e --- /dev/null +++ b/content/en/docs/monitoring/guest-agent.md @@ -0,0 +1,118 @@ +--- +title: "Guest Agent" +weight: 81 +labfoldernumber: "08" +description: > + Guest Agent is an optional component that can run inside of Virtual Machines to provide plenty of additional runtime information. +--- + +In many of the available cloud images the `qemu-guest-agent` package is already installed. In case, it's not preinstalled you can use one of the previously learned concept to install the package. + + +## {{% task %}} Start a virtual machine and explore Guest Agent information + +In this lab we're going to reuse the virtual machine, we created in the {{}}. + +Start the `cloud-init` virtual machine using the following command: + +```bash +virtctl start {{% param "labsubfolderprefix" %}}04-cloudinit --namespace=$USER +``` + +The presence of the Guest Agent in the virtual machine is indicated by a condition in the `VirtualMachineInstance` status. This condition shows that the Guest Agent is connected and ready for use. + +As soon as the virtual machine has started successfully (`kubectl get vm {{% param "labsubfolderprefix" %}}04-cloudinit --namespace=$USER` STATUS `Running`) we can use the following command to display the `VirtualMachineInstance` object. + +```bash +kubectl get vmi {{% param "labsubfolderprefix" %}}04-cloudinit -o yaml --namespace=$USER +``` + +or + +```bash +kubectl describe vmi {{% param "labsubfolderprefix" %}}04-cloudinit --namespace=$USER +``` + +Check the `status.conditions` and verify whether the `AgentConnected` condition is `True` + +```yaml +apiVersion: kubevirt.io/v1 +kind: VirtualMachineInstance +[...] +spec: +[...] +status: + conditions: + - lastProbeTime: null + lastTransitionTime: "2024-10-05T12:02:40Z" + status: "True" + type: Ready + - lastProbeTime: null + lastTransitionTime: null + status: "True" + type: LiveMigratable + - lastProbeTime: "2024-10-05T12:02:56Z" + lastTransitionTime: null + status: "True" + type: AgentConnected +[...] +``` + +In case the guest agent has been able to be connected successfully, there will be additional OS information shown in the status of the `VirtualMachineInstance` as for example: + +* `status.guestOSInfo:`, which contains OS runtime data +* `status.interfaces:` info, which shows QEMU interfaces merged with guest agent interfaces info. + +Explore the additional information. + + +```yaml +status: + [...] + guestOSInfo: + id: fedora + [...] + interfaces: + [...] +``` + + +## {{% task %}} Guest Agent information through virtctl + +In addition to the `status` section in the `VirtualMachineInstance` it's also possible to get additional information from the Guest Agent via `virtctl` or directly using the kube-api. + + +Use the following commands to get the information using the `virtctl` + +```bash +virtctl guestosinfo {{% param "labsubfolderprefix" %}}04-cloudinit --namespace=$USER +``` + +The `guestosinfo` command will return the whole Guest Agent data. + + +If you're only interested in the `userlist` or `filesystemlist` you can execute the following commands: + +```bash +virtctl userlist {{% param "labsubfolderprefix" %}}04-cloudinit --namespace=$USER +``` + +```bash +virtctl fslist {{% param "labsubfolderprefix" %}}04-cloudinit --namespace=$USER +``` + +The full `QEMU Guest Agent Protocol Reference` can be found under this link + + +## End of lab + +The Guest Agent information is a neat way to find out more information about your running virtual machines and to monitor your workload. + +{{% alert title="Cleanup resources" color="warning" %}} {{% param "end-of-lab-text" %}} + +Stop the `VirtualMachineInstance` again: + +```bash +virtctl stop {{% param "labsubfolderprefix" %}}04-cloudinit --namespace=$USER +``` +{{% /alert %}} diff --git a/content/en/docs/monitoring/probes.md b/content/en/docs/monitoring/probes.md new file mode 100644 index 0000000..1349691 --- /dev/null +++ b/content/en/docs/monitoring/probes.md @@ -0,0 +1,531 @@ +--- +title: "Readiness and Liveness Probes" +weight: 82 +labfoldernumber: "08" +description: > + Using Readiness and Liveness Probes to ensure the health of virtual machines. +--- + +Liveness and Readiness Probes can be configured for VirtualMachineInstances similarly to how they are set up for Containers. You can find more information about the probes [here](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/). + +Liveness Probes will stop the VirtualMachineInstance if they fail, allowing higher-level controllers, such as VirtualMachine or VirtualMachineInstanceReplicaSet, to create new instances that should be responsive. + +Readiness Probes signal to Services and Endpoints whether the VirtualMachineInstance is ready to handle traffic. If these probes fail, the VirtualMachineInstance will be removed from the list of Endpoints backing the service until the probe recovers. + +Watchdogs, on the other hand, monitor the Operating System's responsiveness, complementing the workload-centric probes. They require kernel support from the guest OS and additional tools like the commonly used watchdog binary. + +Exec probes are specific Liveness or Readiness probes for VMs. They execute commands inside the VM to assess its readiness or liveliness. The qemu-guest-agent package facilitates running these commands inside the VM. The command provided to an exec probe is wrapped by `virt-probe` in the operator and sent to the guest. + + +## {{% task %}} define a HTTP Liveness Probe + +First, we are going to define our `cloud-init` configuration. Create a file called `cloudinit-probe.yaml` in the folder `{{% param "labsfoldername" %}}/{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}` with the following content: + +```yaml +#cloud-config +password: kubevirt +chpasswd: { expire: False } +bootcmd: + - ["sudo", "dnf", "install", "-y", "nmap-ncat"] + - ["sudo", "systemd-run", "--unit=httpserver", "nc", "-klp", "8081", "-e", '/usr/bin/echo -e HTTP/1.1 200 OK\\nContent-Length: 12\\n\\nHello World!'] +``` + +This will install a simple `httpserver` which will return `HTTP 200` and will be used as health endpoint in the HTTP Liveness Probe. + +Create a secret by executing the following command: + +```bash +kubectl create secret generic {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-cloudinit-probe --from-file=userdata={{% param "labsfoldername" %}}/{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}/cloudinit-probe.yaml --namespace=$USER +``` + +Create virtual machine, referencing the configuration from above by creating a new file `vm_{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe.yaml` in the folder `{{% param "labsfoldername" %}}/{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}/` with the following content: + +```yaml +apiVersion: kubevirt.io/v1 +kind: VirtualMachine +metadata: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe +spec: + running: false + template: + metadata: + labels: + kubevirt.io/domain: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe + spec: + domain: + devices: + disks: + - name: containerdisk + disk: + bus: virtio + - name: cloudinitdisk + disk: + bus: virtio + interfaces: + - name: default + masquerade: {} + resources: + requests: + memory: 1024M + networks: + - name: default + pod: {} + {{< onlyWhen tolerations >}}tolerations: + - effect: NoSchedule + key: baremetal + operator: Equal + value: "true" + {{< /onlyWhen >}}volumes: + - name: containerdisk + containerDisk: + image: {{% param "fedoraCloudCDI" %}} + - name: cloudinitdisk + cloudInitNoCloud: + secretRef: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-cloudinit-probe +``` +Now configure http LivenessProbe with the following specification: + +* initialDelaySeconds: `120` +* periodSeconds: `20` +* http probe on port: `8081` +* timeoutSeconds: `10` + +{{% details title="Task Hint: Solution" %}} + +Your VirtualMachine configuration should look like this: + +```yaml +apiVersion: kubevirt.io/v1 +kind: VirtualMachine +metadata: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe +spec: + running: false + template: + metadata: + labels: + kubevirt.io/domain: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe + spec: + domain: + devices: + disks: + - name: containerdisk + disk: + bus: virtio + - name: cloudinitdisk + disk: + bus: virtio + interfaces: + - name: default + masquerade: {} + resources: + requests: + memory: 1024M + livenessProbe: + initialDelaySeconds: 120 + periodSeconds: 20 + httpGet: + port: 8081 + timeoutSeconds: 10 + networks: + - name: default + pod: {} + {{< onlyWhen tolerations >}}tolerations: + - effect: NoSchedule + key: baremetal + operator: Equal + value: "true" + {{< /onlyWhen >}}volumes: + - name: containerdisk + containerDisk: + image: {{% param "fedoraCloudCDI" %}} + - name: cloudinitdisk + cloudInitNoCloud: + secretRef: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-cloudinit-probe +``` +{{% /details %}} + +Make sure you create your VM with: + +```bash +kubectl apply -f {{% param "labsfoldername" %}}/{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}/vm_{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe.yaml --namespace=$USER +``` + +Start the newly-created VM. This might take a couple of minutes: + +```bash +virtctl start {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe --namespace=$USER +``` + + +## {{% task %}} add a HTTP Readiness Probe + +In addition to the previously configured LivenessProbe, we will add a Probe in this lab. For convenience reasons, we will use the same httpserver and port. Those can be different depending on your needs. + +Add a ReadinessProbe with the following specification to the virtual machine (`vm_{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-liveness.yaml`) + +* initialDelaySeconds: `120` +* periodSeconds: `10` +* http probe on port: `8081` +* timeoutSeconds: `5` +* failureThreshold: `5` +* successThreshold: `5` + +{{% details title="Task Hint: Solution" %}} + +Your VirtualMachine configuration should look like this: + +```yaml +apiVersion: kubevirt.io/v1 +kind: VirtualMachine +metadata: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe +spec: + running: false + template: + metadata: + labels: + kubevirt.io/domain: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe + spec: + domain: + devices: + disks: + - name: containerdisk + disk: + bus: virtio + - name: cloudinitdisk + disk: + bus: virtio + interfaces: + - name: default + masquerade: {} + resources: + requests: + memory: 1024M + livenessProbe: + initialDelaySeconds: 120 + periodSeconds: 20 + httpGet: + port: 8081 + timeoutSeconds: 10 + readinessProbe: + initialDelaySeconds: 120 + periodSeconds: 20 + timeoutSeconds: 10 + httpGet: + port: 8081 + failureThreshold: 5 + successThreshold: 5 + networks: + - name: default + pod: {} + {{< onlyWhen tolerations >}}tolerations: + - effect: NoSchedule + key: baremetal + operator: Equal + value: "true" + {{< /onlyWhen >}}volumes: + - name: containerdisk + containerDisk: + image: {{% param "fedoraCloudCDI" %}} + - name: cloudinitdisk + cloudInitNoCloud: + secretRef: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-cloudinit-liveness +``` +{{% /details %}} + +Apply and restart the virtual machine. + +```bash +kubectl apply -f {{% param "labsfoldername" %}}/{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}/vm_{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe.yaml --namespace=$USER +``` + +```bash +virtctl restart {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe --namespace=$USER +``` + + +## {{% task %}} change HTTP Liveness Probe to TCP + +Instead of checking a HTTP Endpoint during the LivenessProbe, we can also check a TCP Socket. + +Change the LivenessProbe of the virtual machine (`vm_{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-liveness.yaml`) from `HTTP` to `TCP`. Apply the changes and restart the virtual machine. + +{{% details title="Task Hint: Solution" %}} + +Your VirtualMachine configuration should look like this: + +```yaml +apiVersion: kubevirt.io/v1 +kind: VirtualMachine +metadata: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe +spec: + running: false + template: + metadata: + labels: + kubevirt.io/domain: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe + spec: + domain: + devices: + disks: + - name: containerdisk + disk: + bus: virtio + - name: cloudinitdisk + disk: + bus: virtio + interfaces: + - name: default + masquerade: {} + resources: + requests: + memory: 1024M + livenessProbe: + initialDelaySeconds: 120 + periodSeconds: 20 + tcpSocket: + port: 8081 + timeoutSeconds: 10 + readinessProbe: + initialDelaySeconds: 120 + periodSeconds: 20 + timeoutSeconds: 10 + httpGet: + port: 8081 + failureThreshold: 5 + successThreshold: 5 + networks: + - name: default + pod: {} + {{< onlyWhen tolerations >}}tolerations: + - effect: NoSchedule + key: baremetal + operator: Equal + value: "true" + {{< /onlyWhen >}}volumes: + - name: containerdisk + containerDisk: + image: {{% param "fedoraCloudCDI" %}} + - name: cloudinitdisk + cloudInitNoCloud: + secretRef: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-cloudinit-liveness +``` + +```bash +kubectl apply -f {{% param "labsfoldername" %}}/{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}/vm_{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe.yaml --namespace=$USER +``` + +```bash +virtctl restart {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe --namespace=$USER +``` + +{{% /details %}} + + +## {{% task %}} Guest Agent Liveness Probe + +It's also possible the use the Guest Agent, which you learned about in the last lab, as indicator for probes. + +Configure the LivenessProbe of the virtual machine (`vm_{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-liveness.yaml`) to use the `guestAgentPing` instead of the `tcpSocket`. Apply the changes and restart the virtual machine. + +{{% details title="Task Hint: Solution" %}} + +Your VirtualMachine configuration should look like this: + +```yaml +apiVersion: kubevirt.io/v1 +kind: VirtualMachine +metadata: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe +spec: + running: false + template: + metadata: + labels: + kubevirt.io/domain: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe + spec: + domain: + devices: + disks: + - name: containerdisk + disk: + bus: virtio + - name: cloudinitdisk + disk: + bus: virtio + interfaces: + - name: default + masquerade: {} + resources: + requests: + memory: 1024M + livenessProbe: + initialDelaySeconds: 120 + periodSeconds: 20 + guestAgentPing: {} + timeoutSeconds: 10 + readinessProbe: + initialDelaySeconds: 120 + periodSeconds: 20 + timeoutSeconds: 10 + httpGet: + port: 8081 + failureThreshold: 5 + successThreshold: 5 + networks: + - name: default + pod: {} + {{< onlyWhen tolerations >}}tolerations: + - effect: NoSchedule + key: baremetal + operator: Equal + value: "true" + {{< /onlyWhen >}}volumes: + - name: containerdisk + containerDisk: + image: {{% param "fedoraCloudCDI" %}} + - name: cloudinitdisk + cloudInitNoCloud: + secretRef: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-cloudinit-liveness +``` + +```bash +kubectl apply -f {{% param "labsfoldername" %}}/{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}/vm_{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe.yaml --namespace=$USER +``` + +```bash +virtctl restart {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe --namespace=$USER +``` + +{{% /details %}} + + +{{% alert title="Note" color="info" %}} +Additionally to the Guest Agent Ping probe, `exec` probes can also be used. An `exec` probe executes a command to determine the status of the virtual machine. + +As a precondition the Guest Agent needs to be installed in the virtual machine for the probe to work. + +{{% /alert %}} + + +## {{% task %}} (optional) Watchdog example + +A watchdog offers a more VM-centric approach, meaning the OS monitors it self by sending heartbeats to a `i6300esb` device. When the heartbeat stops, the watchdog device executes an action. In our example the `poweroff`. Other possible actions are `reset` and `shutdown`. + +Inside the virtual machine, a component is required which sends the heartbeat. In the following example we will use a busybox, which sends a watchdog heartbeat to `/dev/watchdog` + +First, we are going to define a new `cloud-init` configuration. Create a file called `cloudinit-watchdog.yaml` in the folder `{{% param "labsfoldername" %}}/{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}` with the following content: + +```yaml +#cloud-config +password: kubevirt +chpasswd: { expire: False } +bootcmd: + - ["sudo", "dnf", "install", "-y", "busybox"] + +``` + +Create a secret by executing the following command: + +```bash +kubectl create secret generic {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-cloudinit-watchdog --from-file=userdata={{% param "labsfoldername" %}}/{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}/cloudinit-watchdog.yaml --namespace=$USER +``` + +Create virtual machine, referencing the configuration from above by creating a new file `vm_{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-watchdog.yaml` in the folder `{{% param "labsfoldername" %}}/{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}/` with the following content: + +```yaml +apiVersion: kubevirt.io/v1 +kind: VirtualMachine +metadata: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-watchdog +spec: + running: false + template: + metadata: + labels: + kubevirt.io/domain: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-watchdog + spec: + domain: + devices: + watchdog: + name: mywatchdog + i6300esb: + action: "poweroff" + disks: + - name: containerdisk + disk: + bus: virtio + - name: cloudinitdisk + disk: + bus: virtio + interfaces: + - name: default + masquerade: {} + resources: + requests: + memory: 1024M + networks: + - name: default + pod: {} + {{< onlyWhen tolerations >}}tolerations: + - effect: NoSchedule + key: baremetal + operator: Equal + value: "true" + {{< /onlyWhen >}}volumes: + - name: containerdisk + containerDisk: + image: {{% param "fedoraCloudCDI" %}} + - name: cloudinitdisk + cloudInitNoCloud: + secretRef: + name: {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-cloudinit-watchdog +``` + +Make sure you create your VM with: + +```bash +kubectl apply -f {{% param "labsfoldername" %}}/{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}/vm_{{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-watchdog.yaml --namespace=$USER +``` + +Start the VM. This might take a couple of minutes: + +```bash +virtctl start {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-watchdog --namespace=$USER +``` + +connect to the console + +```bash +virtctl console {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-watchdog --namespace=$USER +``` + +And execute the following command: +```bash +sudo busybox watchdog -t 2000ms -T 10000ms /dev/watchdog +``` + +This will send heartbeats every two seconds for ten seconds after that the virtual machine should be powered off. In a non demo setup you would start the watchdog during startup and not turn it off after a while. + + +## End of lab + +{{% alert title="Cleanup resources" color="warning" %}} {{% param "end-of-lab-text" %}} + +Stop the `VirtualMachineInstance` again: + +```bash +virtctl stop {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-probe --namespace=$USER +``` + +```bash +virtctl stop {{% param "labsubfolderprefix" %}}{{% param "labfoldernumber" %}}-watchdog --namespace=$USER +``` +{{% /alert %}} diff --git a/content/en/docs/monitoring/prometheus-monitoring.md b/content/en/docs/monitoring/prometheus-monitoring.md new file mode 100644 index 0000000..5a65533 --- /dev/null +++ b/content/en/docs/monitoring/prometheus-monitoring.md @@ -0,0 +1,177 @@ +--- +title: "Prometheus Monitoring" +weight: 83 +labfoldernumber: "08" +description: > + Monitoring virtual machines with prometheus +--- + + +Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It collects real-time metrics from services and systems, stores them in a time-series database, and provides powerful querying capabilities. Prometheus operates with a pull-based model, scraping metrics from endpoints at regular intervals. It supports multi-dimensional data through labels, enabling flexible queries and insights. Paired with tools like Grafana for visualization, Prometheus is widely used for monitoring cloud-native applications, infrastructure, and system performance, with built-in alerting to notify users of potential issues. + +It is the defacto standard tool set for monitoring workload on Kubernetes. + +All KubeVirt components expose Prometheus metrics by default and are therefore easily integrable into an existing prometheus monitoring stack. + + +## {{% task %}} Raw KubeVirt Prometheus Metrics + +Prometheus is highly integrated into the Kubernetes ecosystem. It uses a concept called service discovery to discover components within a Kubernetes cluster which expose metrics. All discovered components will then be scraped and the metrics end up in the above mentioned time-series database. + +All Kubevirt Pods which expose metrics are labeled with `prometheus.kubevirt.io` and contain a port which is called `metrics`. In addition to that all the Pods are summarized in a Kubernetes Service `kubevirt-prometheus-metrics`. + +Execute the following command to display the service. + +```bash +kubectl describe service kubevirt-prometheus-metrics -n kubevirt +``` +You can see a long list of endpoint addresses. This is the collection of KubeVirt components which expose prometheus metrics. + +```bash +Name: kubevirt-prometheus-metrics +Namespace: kubevirt +Labels: app.kubernetes.io/component=kubevirt + app.kubernetes.io/managed-by=virt-operator + kubevirt.io= + prometheus.kubevirt.io=true +Annotations: kubevirt.io/customizer-identifier: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f + kubevirt.io/generation: 21 + kubevirt.io/install-strategy-identifier: 96d0fd48fa88abe041085474347e87222b076258 + kubevirt.io/install-strategy-registry: quay.io/kubevirt + kubevirt.io/install-strategy-version: v1.3.0 +Selector: prometheus.kubevirt.io=true +Type: ClusterIP +IP Family Policy: SingleStack +IP Families: IPv4 +IP: None +IPs: None +Port: metrics 443/TCP +TargetPort: metrics/TCP +Endpoints: 10.244.1.238:8443,10.244.5.229:8443,10.244.5.156:8443 + 10 more... +Session Affinity: None +Internal Traffic Policy: Cluster +Events: +``` + +We can get the metrics provided by a pod, by simply sending an HTTP Get request to one of the endpoint addresses. + +Execute the following command +```bash +kubectl describe endpoints -n kubevirt kubevirt-prometheus-metrics +``` + +and use the fist IP address in the Addresses list for the next command: + +```bash +curl -k https://:8443/metrics +``` + +The result will be a list of KubeVirt Metrics, this specific Pod exposes. + + +## Configure Prometheus to scrape KubeVirt Metrics + +To integrate all those KubeVirt Components into a running Prometheus stack, the following configuration is required in the `KubeVirt` custom resource: + +* monitorAccount: `` +* monitorNamespace: `` +* serviceMonitorNamespace: `` + + +```yaml +apiVersion: kubevirt.io/v1 +kind: KubeVirt +metadata: + name: kubevirt + namespace: kubevirt +spec: +[...] + monitorAccount: ` + monitorNamespace: + serviceMonitorNamespace: +[...] +``` + +This will then have the effect, that KubeVirt itself will deploy the necessary resources to be integrated into the prometheus stack. + +One of the most important components is the ServiceMonitor, which tells Prometheus where to scrape the KubeVirt Metrics from, as we have learned in the previous lab. + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + annotations: + name: prometheus-kubevirt-rules + namespace: monitoring +spec: + endpoints: + - honorLabels: true + port: metrics + scheme: https + tlsConfig: + ca: {} + cert: {} + insecureSkipVerify: true + namespaceSelector: + matchNames: + - kubevirt + selector: + matchLabels: + prometheus.kubevirt.io: "true" +``` + +This integration has been done already on the lab cluster. + +Alongside with the KubeVirt ServiceMonitor, KubeVirt also deployed a set of PrometheusRules (Alerts). + +You can have a look at the Alerts by executing the following command: + +```bash +kubectl get PrometheusRule prometheus-kubevirt-rules -n kubevirt -o yaml +``` + +Those Alerts are a very good monitoring foundation for our workload. Make sure in a production environment, that firing alerts are monitored and fixed. + + +## {{% task %}} Explore Prometheus UI + +Ask the trainer for the correct prometheus URL and open the Prometheus UI in a separate browser tab. + +First navigate to the Service Discovery page under Status --> Service Discovery. You will find the discovered `kubevirt-servicemonitor` ServiceMonitor. +The successful Service Discovery process will configure all the endpoints as seen above under Prometheus Targets, to scrape the metrics correctly. Check the Status --> Targets page to check on the KubeVirt Targets. + +On the Alerts page, you see an overview of all configured Alerts, including KubeVirt Alerts. + +Going back to the Main view, by clicking on the Prometheus Logo, start searching our data. + +Execute the following queries: + +**What KubeVirt verion is running?** +```promql +kubevirt_info +``` + +**How many VMs per Namespace exist?** +```promql +kubevirt_number_of_vms +``` + +**How much CPU time have the VMIs used?** +```promql +kubevirt_vmi_vcpu_seconds_total +``` + +You can also use the Graph tab to display the data over time. + + +## {{% task %}} Explore the complete list of prometheus metrics in the documentation + +The typeahead feature of the Prometheus UI allows you to search for metrics. All KubeVirt Metrics start with `kubevirt_` + +You can find the complete list of KubeVirt metrics here: + +Try to answer questions like: + +* how much memory do my VMIs use? +* how many live migrations were successful? +* how much network traffic was received by VMI xy?