Skip to content
bgrant0607 edited this page Sep 11, 2014 · 20 revisions

Under construction.

Tips that may help you debug why Kubernetes isn't working.

Of course, also take a look at the documentation, especially the getting-started guides.

Checking logs

Depending on the Linux distribution, the logs of system components, including Docker, will be in /var/log or /tmp, or can be accessed using journalctl on systemd-based systems, such as Fedora, RHEL7, or CoreOS.

If you don't see much useful in the logs, you could try turning on verbose logging on the Kubernetes component you suspect has a problem. See https://github.com/golang/glog for more details.

By symptom

  • dev-build-and-up.sh waits for ever at Waiting for cluster initialization
    • Try cluster/kube-down.sh and hack/dev-build-and-up.sh again
      • If it still hangs, ctrl-c and try hack/dev-build-and-push.sh
      • Check whether all the VMs exist -- typically one master VM and N minions
        • If so, check whether you can ssh into them
        • Check serial console output, if available
    • If it still doesn't work, see provider-specific issues below
  • kubecfg cannot reach apiserver
    • Ensure KUBERNETES_MASTER or KUBE_MASTER_IP is set, or use -h
    • Ensure apiserver is running
      • Check that the process is running on the master
      • Check its logs
  • kubecfg hangs forever or a pod is in state Waiting forever
    • Ensure all backend components are running: controller, scheduler, etcd, kubelets
    • Ensure all k8s components have --etcd_servers set correctly on the command line (if it isn't, you should see error messages in their logs)
      • If it's not set, your networking setup may be broken, since it is usually initialized from the IP address of kubernetes-master, such as in cluster/saltbase/salt/apiserver/default
  • apiserver reports Error synchronizing container: Get http://:10250/podInfo?podID=foo: dial tcp :10250: connection refused
    • Just means that pod foo has not yet been scheduled (see #1285)
    • Check whether the scheduler is running properly
    • If the scheduler is running, possibly no minion addresses were passed to the apiserver using --machines (see hack/local-cluster-up.sh for an example)
  • Cannot connect to the container
    • Try to telnet to the minion at its service port, and/or to the pod's IP and port
    • Check whether the container has been created in Docker: sudo docker ps -a
      • If you don't see the container, there could be a problem with the pod configuration, image, Docker, or Kubelet
      • If you see containers created every 10 seconds, then container creation is failing or the container's process is failing

Build problems

rm -rf Godeps/_workspace/pkg output _output

Networking problems

TODO

Other provider-specific issues

TODO

GCE

  • Ensure you can ssh to an instance, which may require enabling billing and/or creating an ssh key. Create an instance if you don't have one, then use gcutil ssh to ssh into it.
  • gcutil listfirewalls ; gcutil getfirewall default-ssh
    • If default-ssh doesn't exist, do gcutil addfirewall --description "SSH allowed from anywhere" --allowed=tcp:22 default-ssh
  • gcutil listnetworks