Skip to content

Latest commit

 

History

History
63 lines (45 loc) · 3.02 KB

METRICS.md

File metadata and controls

63 lines (45 loc) · 3.02 KB

Metrics

Kiam can exports both Prometheus metrics to determine the health of the system, check the timing of each RPC call, and monitor the size of the credentials cache. By default, Prometheus metrics are exported on localhost:9620.

Dashboard

A example Grafana dashboard with Prometheus as datasource is provided here, it displays the basic metrics and includes daemonset status from kube-state-metrics & container metrics from cAdvisor if available.

Dashboard-1 Dashboard-2 Dashboard-3

Metrics configuration

  • The prometheus-listen-addr controls which address Kiam should create a Prometheus endpoint on. This is by default localhost:9620. The metrics themselves can be accessed at <prometheus-listen-addr>/metrics.
  • The prometheus-sync-interval flag controls how frequently Prometheus metrics should be updated. This is by default 5s.

Emitted Metrics

Prometheus

Metadata Subsystem

  • kiam_metadata_handler_latency_seconds - Bucketed histogram of handler timings. Tagged by handler
  • kiam_metadata_credential_fetch_errors_total - Number of errors fetching the credentials for a pod
  • kiam_metadata_credential_encode_errors_total - Number of errors encoding credentials for a pod
  • kiam_metadata_find_role_errors_total - Number of errors finding the role for a pod
  • kiam_metadata_empty_role_total - Number of empty roles returned
  • kiam_metadata_success_total - Number of successful responses from a handler
  • kiam_metadata_responses_total - Responses from mocked out metadata handlers
  • kiam_metadata_proxy_requests_blocked_total - Number of access requests to the proxy handler that were blocked by the regexp

STS Subsystem

  • kiam_sts_cache_hit_total - Number of cache hits to the metadata cache
  • kiam_sts_cache_miss_total - Number of cache misses to the metadata cache
  • kiam_sts_issuing_errors_total - Number of errors issuing credentials
  • kiam_sts_assumerole_timing_seconds - Bucketed histogram of assumeRole timings
  • kiam_sts_assumerole_current - Number of assume role calls currently executing

K8s Subsystem

  • kiam_k8s_dropped_pods_total - Number of dropped pods because of full buffer

gRPC Server (Kiam Server)

  • grpc_server_handled_total - Total number of RPCs completed on the server, regardless of success or failure.
  • grpc_server_msg_received_total - Total number of RPC stream messages received on the server.
  • grpc_server_msg_sent_total - Total number of gRPC stream messages sent by the server.
  • grpc_server_started_total - Total number of RPCs started on the server.

gRPC Client (Kiam Agent)

  • grpc_client_handled_total - Total number of RPCs completed by the client, regardless of success or failure.
  • grpc_client_msg_received_total - Total number of RPC stream messages received by the client.
  • grpc_client_msg_sent_total - Total number of gRPC stream messages sent by the client.
  • grpc_client_started_total - Total number of RPCs started on the client.