Skip to content
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.

Commit

Permalink
V3 release (#195)
Browse files Browse the repository at this point in the history
v3 release
- added changelog
- start work on upgrade doc to cover v2 -> v3
- rename histogram metrics to clarify seconds
- update grafana dashboard to fix plotting of histogram data
- add quay badge to readme
  • Loading branch information
pingles authored Dec 6, 2018
1 parent 4bee061 commit a069095
Show file tree
Hide file tree
Showing 7 changed files with 79 additions and 20 deletions.
35 changes: 35 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,40 @@
# Changelog

## v3.0
6 December 2018

v3 introduces a change to the gRPC API. Servers are compatible with v2.x Agents although **v3 Agents require v3 Servers**. Other breaking changes have been made so it's worth reading through [docs/UPGRADING.md](docs/UPGRADING.md) for more detail on moving from v2 to v3.

Notable changes:

* [#109](https://github.com/uswitch/kiam/pull/109) v3 API
* [#110](https://github.com/uswitch/kiam/pull/110) Restrict metadata routes. Everything other than credentials **will be blocked by default**
* [#122](https://github.com/uswitch/kiam/pull/122) Record Server error messages as Events on Pod
* [#131](https://github.com/uswitch/kiam/pull/131) Replace go-metrics with native Prometheus metrics client
* [#140](https://github.com/uswitch/kiam/pull/140) Example Grafana dashboard for Prometheus metrics
* [#163](https://github.com/uswitch/kiam/pull/163) Server manifests use 127.0.0.1 rather than localhost to avoid DNS
* [#173](https://github.com/uswitch/kiam/pull/173) Metadata Agent uses 301 rather than 308 redirects
* [#180](https://github.com/uswitch/kiam/pull/180) Fix race condition with xtables.lock
* [#193](https://github.com/uswitch/kiam/pull/193) Add optional pprof http handler to add monitoring in live clusters

A huge thanks to the following contributors for this release:

* [@Joseph-Irving](https://github.com/Joseph-Irving)
* [@max-lobur](https://github.com/max-lobur)
* [@fernandocarletti](https://github.com/fernandocarletti)
* [@integrii](https://github.com/integrii)
* [@duncward](https://github.com/duncward)
* [@stevenjm](https://github.com/stevenjm)
* [@tasdikrahman](https://github.com/tasdikrahman)
* [@word](https://github.com/word)
* [@DewaldV](https://github.com/DewaldV)
* [@roffe](https://github.com/roffe)
* [@sambooo](https://github.com/sambooo)
* [@idiamond-stripe](https://github.com/idiamond-stripe)
* [@ash2k](https://github.com/ash2k)
* [@moofish32](https://github.com/moofish32)
* [@sp-joseluis-ledesma](https://github.com/sp-joseluis-ledesma)

## v2.8
1st June 2018

Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# kiam

[![Docker Repository on Quay](https://quay.io/repository/uswitch/kiam/status "Docker Repository on Quay")](https://quay.io/repository/uswitch/kiam)

kiam runs as an agent on each node in your Kubernetes cluster and allows cluster users to associate IAM roles to Pods.

Docker images are available at [https://quay.io/repository/uswitch/kiam](https://quay.io/repository/uswitch/kiam).
Expand Down
4 changes: 2 additions & 2 deletions docs/METRICS.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ daemonset status from kube-state-metrics & container metrics from cAdvisor if av

#### Metadata Subsystem

- `kiam_metadata_handler_latency_milliseconds` - Bucketed histogram of handler timings. Tagged by handler
- `kiam_metadata_handler_latency_seconds` - Bucketed histogram of handler timings. Tagged by handler
- `kiam_metadata_credential_fetch_errors_total` - Number of errors fetching the credentials for a pod
- `kiam_metadata_credential_encode_errors_total` - Number of errors encoding credentials for a pod
- `kiam_metadata_find_role_errors_total` - Number of errors finding the role for a pod
Expand All @@ -51,7 +51,7 @@ daemonset status from kube-state-metrics & container metrics from cAdvisor if av
- `kiam_sts_cache_hit_total` - Number of cache hits to the metadata cache
- `kiam_sts_cache_miss_total` - Number of cache misses to the metadata cache
- `kiam_sts_issuing_errors_total` - Number of errors issuing credentials
- `kiam_sts_assumerole_timing_milliseconds` - Bucketed histogram of assumeRole timings
- `kiam_sts_assumerole_timing_seconds` - Bucketed histogram of assumeRole timings
- `kiam_sts_assumerole_current` - Number of assume role calls currently executing

#### K8s Subsystem
Expand Down
20 changes: 20 additions & 0 deletions docs/UPGRADING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Upgrading

## v2 to v3

Kiam changed significantly between v2.X and v3.0. Breaking changes are:

* The gRPC API was changed. v3 Agent processes can only connect and communicate with v3 Server processes.
* The Agent metadata proxy HTTP server now blocks access to any path other than those used for obtaining credentials.
* Server's handling of TLS has changed to remove port from Host. This requires certificates to name `kiam-server` rather than `kiam-server:443`, for example. Any issued certificates will likely need re-issuing.
* Separated agent, server and health commands have been merged into a kiam binary. This means that when upgrading the image referenced the command and arguments used will also need to change.
* Server now reports events to Pods, requiring additional RBAC privileges for the service account.

We would suggest upgrading in the following way:

1. Generate new TLS assets. You can use [docs/TLS.md](docs/TLS.md) to create new certificates, or use something like [cert-manager](https://github.com/jetstack/cert-manager) or [Vault](https://vaultproject.io). Given the TLS changes make sure that your server certificate supports names:
* `kiam-server`
* `kiam-server:443`
* `127.0.0.1`
2. Create a new DaemonSet to deploy the v3 Server processes and should use the new TLS assets deployed above. This will ensure that you have new server processes running alongside the old servers. Once the v3 servers are running and passing their health checks you can proceed. **Please note that RBAC policy changes are required for the Server** and are documented in [deploy/server-rbac.yaml](deploy/server-rbac.yaml)
3. Update the Agent DaemonSet to use the v3 image. Because the command has changed it's worth being careful when changing this as the existing configuration will not work with v3. One option is to ensure your DaemonSet uses a `OnDelete` [update strategy](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/#daemonset-update-strategy): you can deploy new nodes running new agents connecting to new servers while leaving existing nodes as-is.
33 changes: 17 additions & 16 deletions docs/dashboard-prom.json
Original file line number Diff line number Diff line change
Expand Up @@ -972,14 +972,14 @@
"min": null,
"mode": "spectrum"
},
"dataFormat": "timeseries",
"dataFormat": "tsbuckets",
"datasource": "$datasource",
"description": "Bucketed histogram of handler timings. Tagged by handler",
"gridPos": {
"h": 5,
"w": 12,
"x": 0,
"y": 13
"y": 24
},
"heatmap": {},
"highlightCards": true,
Expand All @@ -990,8 +990,9 @@
"links": [],
"targets": [
{
"expr": "sum(increase(kiam_metadata_handler_latency_milliseconds_bucket{handler=\"credentials\"}[$interval])) by (le)",
"format": "time_series",
"expr": "sum(rate(kiam_metadata_handler_latency_seconds_bucket{handler=\"credentials\"}[$interval])) by (le)",
"format": "heatmap",
"interval": "",
"intervalFactor": 2,
"legendFormat": "{{le}}",
"refId": "A",
Expand All @@ -1012,14 +1013,14 @@
"xBucketSize": null,
"yAxis": {
"decimals": null,
"format": "ms",
"format": "s",
"logBase": 1,
"max": null,
"min": null,
"show": true,
"splitFactor": null
},
"yBucketBound": "auto",
"yBucketBound": "upper",
"yBucketNumber": null,
"yBucketSize": null
},
Expand All @@ -1037,14 +1038,14 @@
"min": null,
"mode": "spectrum"
},
"dataFormat": "timeseries",
"dataFormat": "tsbuckets",
"datasource": "$datasource",
"description": "Bucketed histogram of handler timings. Tagged by handler",
"gridPos": {
"h": 5,
"w": 12,
"x": 12,
"y": 13
"y": 24
},
"heatmap": {},
"highlightCards": true,
Expand All @@ -1055,8 +1056,8 @@
"links": [],
"targets": [
{
"expr": "sum(increase(kiam_metadata_handler_latency_milliseconds_bucket{handler=\"roleName\"}[$interval])) by (le)",
"format": "time_series",
"expr": "sum(rate(kiam_metadata_handler_latency_seconds_bucket{handler=\"roleName\"}[$interval])) by (le)",
"format": "heatmap",
"interval": "",
"intervalFactor": 2,
"legendFormat": "{{le}}",
Expand Down Expand Up @@ -1084,7 +1085,7 @@
"show": true,
"splitFactor": null
},
"yBucketBound": "auto",
"yBucketBound": "upper",
"yBucketNumber": null,
"yBucketSize": null
},
Expand All @@ -1102,14 +1103,14 @@
"min": null,
"mode": "spectrum"
},
"dataFormat": "timeseries",
"dataFormat": "tsbuckets",
"datasource": "$datasource",
"description": "Bucketed histogram of assumeRole timings",
"gridPos": {
"h": 6,
"w": 24,
"x": 0,
"y": 18
"y": 29
},
"heatmap": {},
"highlightCards": true,
Expand All @@ -1120,8 +1121,8 @@
"links": [],
"targets": [
{
"expr": "sum(increase(kiam_sts_assumerole_timing_milliseconds_bucket[$interval])) by (le)",
"format": "time_series",
"expr": "sum(rate(kiam_sts_assumerole_timing_seconds_bucket[$interval])) by (le)",
"format": "heatmap",
"intervalFactor": 2,
"legendFormat": "{{le}}",
"refId": "A",
Expand All @@ -1142,7 +1143,7 @@
"xBucketSize": null,
"yAxis": {
"decimals": null,
"format": "ms",
"format": "s",
"logBase": 1,
"max": null,
"min": null,
Expand Down
2 changes: 1 addition & 1 deletion pkg/aws/metadata/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ var (
prometheus.HistogramOpts{
Namespace: "kiam",
Subsystem: "metadata",
Name: "handler_latency_milliseconds",
Name: "handler_latency_seconds",
Help: "Bucketed histogram of handler timings",

// 1ms to 5min
Expand Down
2 changes: 1 addition & 1 deletion pkg/aws/sts/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ var (
prometheus.HistogramOpts{
Namespace: "kiam",
Subsystem: "sts",
Name: "assumerole_timing_milliseconds",
Name: "assumerole_timing_seconds",
Help: "Bucketed histogram of assumeRole timings",

// 1ms to 5min
Expand Down

0 comments on commit a069095

Please sign in to comment.