-
Notifications
You must be signed in to change notification settings - Fork 10
Kubernetes
Install minicubes from https://github.com/kubernetes/minikube/releases
then https://kubernetes.io/docs/getting-started-guides/minikube/#instructions
gcloud container clusters get-credentials your-cluster-name
Download kobs and kubectl following this instructions.
To setup credentials for the aws command line
aws configure
Go to https://console.aws.amazon.com/iam/home?region=us-east-1#/users
find your name
Click in security credentials, and then Create access key.
A kops user with the required permissions has already being created. (https://console.aws.amazon.com/iam/home?region=us-east-1#/users/kops?section=security_credentials)
An state store bucket has already being created (https://console.aws.amazon.com/s3/buckets/neuroglancer-state-store/?region=us-east-1&tab=overview)
export NAME=omniglancer.com
export KOPS_STATE_STORE=s3://neuroglancer-state-store
First create a juju controller
$ juju bootstrap aws/us-east-1
Creating Juju controller "aws-us-east-1" on aws/us-east-1
Looking for packaged Juju agent version 2.0.2 for amd64
Launching controller instance(s) on aws/us-east-1...
- i-041bc99b681b95549 (arch=amd64 mem=4G cores=2)
Fetching Juju GUI 2.5.2
Waiting for address
Attempting to connect to 172.31.22.225:22
Attempting to connect to 34.203.239.51:22
Logging to /var/log/cloud-init-output.log on the bootstrap machine
Running apt-get update
Running apt-get upgrade
Installing curl, cpu-checker, bridge-utils, cloud-utils, tmux
Fetching Juju agent version 2.0.2 for amd64
Installing Juju machine agent
Starting Juju machine agent (service jujud-machine-0)
Bootstrap agent now started
Contacting Juju controller at 172.31.22.225 to verify accessibility...
Bootstrap complete, "aws-us-east-1" controller now available.
Controller machines are in the "controller" model.
Initial model "default" added.
Now that the controller is up, we can tell it to deploy a kubernetes cluster
$ juju deploy k8s-1cpu-1gpu-aws.yaml
Deploying charm "cs:~containers/easyrsa-8"
added resource easyrsa
Deploying charm "cs:~containers/etcd-29"
added resource etcd
added resource snapshot
Deploying charm "cs:~containers/flannel-13"
added resource flannel
Deploying charm "cs:~containers/kubernetes-master-17"
added resource kube-scheduler
added resource kubectl
added resource cdk-addons
added resource kube-apiserver
added resource kube-controller-manager
application kubernetes-master exposed
Deploying charm "cs:~containers/kubernetes-worker-22"
added resource cni
added resource kube-proxy
added resource kubectl
added resource kubelet
application kubernetes-worker-cpu exposed
Deploying charm "cs:~containers/kubernetes-worker-22"
added resource kube-proxy
added resource kubectl
added resource kubelet
added resource cni
application kubernetes-worker-gpu exposed
Related "kubernetes-master:cluster-dns" and "kubernetes-worker-cpu:kube-dns"
Related "kubernetes-master:kube-control" and "kubernetes-worker-cpu:kube-control"
Related "flannel:cni" and "kubernetes-worker-cpu:cni"
Related "kubernetes-worker-cpu:certificates" and "easyrsa:client"
Related "kubernetes-worker-cpu:kube-api-endpoint" and "kubernetes-master:kube-api-endpoint"
Related "kubernetes-master:cluster-dns" and "kubernetes-worker-gpu:kube-dns"
Related "kubernetes-master:kube-control" and "kubernetes-worker-gpu:kube-control"
Related "flannel:cni" and "kubernetes-worker-gpu:cni"
Related "kubernetes-worker-gpu:certificates" and "easyrsa:client"
Related "kubernetes-worker-gpu:kube-api-endpoint" and "kubernetes-master:kube-api-endpoint"
Related "kubernetes-master:certificates" and "easyrsa:client"
Related "etcd:certificates" and "easyrsa:client"
Related "kubernetes-master:etcd" and "etcd:db"
Related "flannel:etcd" and "etcd:db"
Related "flannel:cni" and "kubernetes-master:cni"
Deploy of bundle completed.
You now have to wait until the cluster is read, look at the progress by running
$ watch -c juju status --color
Once it is ready, download the config file so that you can use kubectl
juju scp kubernetes-master/0:config ~/.kube/config
get grafana url
kubectl cluster-info
get usrnema/password
kubectl config view
kubectl create secret generic secrets \
--from-file=$HOME/.cloudvolume/secrets/google-secret.json \
--from-file=$HOME/.cloudvolume/secrets/aws-secret.json \
--from-file=$HOME/.cloudvolume/secrets/boss-secret.json
If you have more than 50 pods, you might need to increase the number of domain name servers.
If you do kubectl get deployment --namespace=kube-system
you can see how many kube-dns
and kube-dns-autoscaler
pods are available. The autoscaler should in theory keep up with demand, but we've seen that the default scaling parameter is not sufficient. You'll know this if you see lots of errors like:
- Temporary failure in name resolution
- GAXError
- Server not found
- NewConnectionError
- etc
To take manual control, first disable the autoscaler, then scale the kube-dns pods. If you don't disable the autoscaler, it will reset it right back.
kubectl --namespace=kube-system scale deployment kube-dns-autoscaler --replicas=0
kubectl --namespace=kube-system scale deployment kube-dns --replicas=<NUM_YOU_WANT>
You can also try tuning the autoscaler parameters:
kubectl edit configmap kube-dns-autoscaler --namespace=kube-system
Read more here: https://kubernetes.io/docs/tasks/administer-cluster/dns-horizontal-autoscaling/#tuning-autoscaling-parameters
Anecdotally, with 60 16 core machines downsampling, it seemed that nearly every time I increased the number of DNS nodes the tasks leased increased. I (wms) ended up doing 100 DNS kubes on 61 machines with 960 pods. I'm sure there's some inefficiency in there, but it seemed to work.