Skip to content

Kubernetes

William Silversmith edited this page Nov 28, 2017 · 22 revisions

Locally (useful for development)

Install minicubes from https://github.com/kubernetes/minikube/releases
then https://kubernetes.io/docs/getting-started-guides/minikube/#instructions

connect local kubectl with gcloud cluster

gcloud container clusters get-credentials your-cluster-name

Using kops

Download kobs and kubectl following this instructions.

To setup credentials for the aws command line

aws configure

Go to https://console.aws.amazon.com/iam/home?region=us-east-1#/users
find your name
Click in security credentials, and then Create access key.

A kops user with the required permissions has already being created. (https://console.aws.amazon.com/iam/home?region=us-east-1#/users/kops?section=security_credentials)

An state store bucket has already being created (https://console.aws.amazon.com/s3/buckets/neuroglancer-state-store/?region=us-east-1&tab=overview)

export NAME=omniglancer.com          
export KOPS_STATE_STORE=s3://neuroglancer-state-store

using Juju

First create a juju controller

$ juju bootstrap aws/us-east-1                        
Creating Juju controller "aws-us-east-1" on aws/us-east-1
Looking for packaged Juju agent version 2.0.2 for amd64
Launching controller instance(s) on aws/us-east-1...
 - i-041bc99b681b95549 (arch=amd64 mem=4G cores=2)
Fetching Juju GUI 2.5.2
Waiting for address
Attempting to connect to 172.31.22.225:22
Attempting to connect to 34.203.239.51:22
Logging to /var/log/cloud-init-output.log on the bootstrap machine
Running apt-get update
Running apt-get upgrade
Installing curl, cpu-checker, bridge-utils, cloud-utils, tmux
Fetching Juju agent version 2.0.2 for amd64
Installing Juju machine agent
Starting Juju machine agent (service jujud-machine-0)
Bootstrap agent now started
Contacting Juju controller at 172.31.22.225 to verify accessibility...
Bootstrap complete, "aws-us-east-1" controller now available.
Controller machines are in the "controller" model.
Initial model "default" added.

Now that the controller is up, we can tell it to deploy a kubernetes cluster

$ juju deploy k8s-1cpu-1gpu-aws.yaml 
Deploying charm "cs:~containers/easyrsa-8"
added resource easyrsa
Deploying charm "cs:~containers/etcd-29"
added resource etcd
added resource snapshot
Deploying charm "cs:~containers/flannel-13"
added resource flannel
Deploying charm "cs:~containers/kubernetes-master-17"
added resource kube-scheduler
added resource kubectl
added resource cdk-addons
added resource kube-apiserver
added resource kube-controller-manager
application kubernetes-master exposed
Deploying charm "cs:~containers/kubernetes-worker-22"
added resource cni
added resource kube-proxy
added resource kubectl
added resource kubelet
application kubernetes-worker-cpu exposed
Deploying charm "cs:~containers/kubernetes-worker-22"
added resource kube-proxy
added resource kubectl
added resource kubelet
added resource cni
application kubernetes-worker-gpu exposed
Related "kubernetes-master:cluster-dns" and "kubernetes-worker-cpu:kube-dns"
Related "kubernetes-master:kube-control" and "kubernetes-worker-cpu:kube-control"
Related "flannel:cni" and "kubernetes-worker-cpu:cni"
Related "kubernetes-worker-cpu:certificates" and "easyrsa:client"
Related "kubernetes-worker-cpu:kube-api-endpoint" and "kubernetes-master:kube-api-endpoint"
Related "kubernetes-master:cluster-dns" and "kubernetes-worker-gpu:kube-dns"
Related "kubernetes-master:kube-control" and "kubernetes-worker-gpu:kube-control"
Related "flannel:cni" and "kubernetes-worker-gpu:cni"
Related "kubernetes-worker-gpu:certificates" and "easyrsa:client"
Related "kubernetes-worker-gpu:kube-api-endpoint" and "kubernetes-master:kube-api-endpoint"
Related "kubernetes-master:certificates" and "easyrsa:client"
Related "etcd:certificates" and "easyrsa:client"
Related "kubernetes-master:etcd" and "etcd:db"
Related "flannel:etcd" and "etcd:db"
Related "flannel:cni" and "kubernetes-master:cni"
Deploy of bundle completed.

You now have to wait until the cluster is read, look at the progress by running

$ watch -c juju status --color

Once it is ready, download the config file so that you can use kubectl

juju scp kubernetes-master/0:config ~/.kube/config

Resource monitoring

get grafana url

kubectl cluster-info

get usrnema/password

kubectl config view

Credentials

kubectl create secret generic secrets \
--from-file=$HOME/.cloudvolume/secrets/google-secret.json \
--from-file=$HOME/.cloudvolume/secrets/aws-secret.json \
--from-file=$HOME/.cloudvolume/secrets/boss-secret.json

Larger cluster

If you have more than 50 pods, you might need to increase the number of domain name servers. If you do kubectl get deployment --namespace=kube-system you can see how many kube-dns and kube-dns-autoscaler pods are available. The autoscaler should in theory keep up with demand, but we've seen that the default scaling parameter is not sufficient. You'll know this if you see lots of errors like:

  • Temporary failure in name resolution
  • GAXError
  • Server not found
  • NewConnectionError
  • etc

To take manual control, first disable the autoscaler, then scale the kube-dns pods. If you don't disable the autoscaler, it will reset it right back.

kubectl --namespace=kube-system scale deployment kube-dns-autoscaler --replicas=0
kubectl --namespace=kube-system scale deployment kube-dns --replicas=<NUM_YOU_WANT>

You can also try tuning the autoscaler parameters:

kubectl edit configmap kube-dns-autoscaler --namespace=kube-system

Read more here: https://kubernetes.io/docs/tasks/administer-cluster/dns-horizontal-autoscaling/#tuning-autoscaling-parameters

Anecdotally, with 60 16 core machines downsampling, it seemed that nearly every time I increased the number of DNS nodes the tasks leased increased. I (wms) ended up doing 100 DNS kubes on 61 machines with 960 pods. I'm sure there's some inefficiency in there, but it seemed to work.