Updated May 28, 2015
This document is intended to capture the set of supported use cases, features, docs, and patterns that we feel are required to call Kubernetes “feature complete” for a 1.0 release candidate.
This list does not emphasize the bug fixes and stabilization that will be required to take it all the way to production ready. Please see the [Github issues] (https://github.com/GoogleCloudPlatform/kubernetes/issues) for a more detailed view.
This is a living document, where suggested changes can be made via a pull request.
Most realistic examples of production services include a load-balanced web frontend exposed to the public Internet, with a stateful backend, such as a clustered database or key-value store. We will target such workloads for our 1.0 release.
For existing and future workloads, we want to provide a consistent, stable set of APIs, over which developers can build and extend Kubernetes. This includes input validation, a consistent API structure, clean semantics, and improved diagnosability of the system. ||||||| merged common ancestors
- Consistent v1 API
- Status: DONE. v1beta3 was developed as the release candidate for the v1 API.
- Multi-port services for apps which need more than one port on the same portal IP (#1802)
- Status: DONE. Released in 0.15.0
- Nominal services for applications which need one stable IP per pod instance (#260)
- Status: #2585 covers some design options.
- API input is scrubbed of status fields in favor of a new API to set status (#4248)
- Status: DONE
- Input validation reporting versioned field names (#3084)
- Status: in progress
- Error reporting: Report common problems in ways that users can discover
- Status:
- Event management: Make events usable and useful
- Status:
- Persistent storage support (#5105)
- Status: in progress
- Status: in progress (#6949)
- Handle node death
- Status: mostly covered by nodes joining/leaving a cluster
- Status: design in progress
- Allow kernel upgrades
- Status: mostly covered by nodes joining/leaving a cluster, need demonstration
- Allow rolling-updates to fail gracefully (#1353)
- Status:
- Easy .dockercfg
- Status:
- Demonstrate cluster stability over time
- Status
- Kubelet use the kubernetes API to fetch jobs to run (instead of etcd) on supported platforms
- Status: DONE
- Restart system components in case of crash (#2884)
- Status: in progress
- Scale to 100 nodes (#3876)
- Status: in progress
- Scale to 30-50 pods (1-2 containers each) per node (#4188)
- Status:
- Scheduling throughput: 99% of scheduling decisions made in less than 1s on 100 node, 3000 pod cluster; linear time to number of nodes and pods (#3954)
- Startup time: 99% of end-to-end pod startup time with prepulled images is less than 5s on 100 node, 3000 pod cluster; linear time to number of nodes and pods (#3952, #3954)
- Status:
- API performance: 99% of API calls return in less than 1s; constant time to number of nodes and pods (#4521)
- Status:
- Manage and report disk space on nodes (#4135)
- Status: in progress
- API test coverage more than 85% in e2e tests
- Status:
In addition, we will provide versioning and deprecation policies for the APIs.
Currently, a cluster is a set of nodes (VMs, machines), managed by a master, running a version of Kubernetes. This master is the cluster-level control-plane. For the purpose of running production workloads, members of the cluster must be serviceable and upgradeable.
For applications / micro-services that run on Kubernetes, we want deployments to be easy but powerful. An Operations user should be able to launch a micro-service, letting the scheduler find the right placement. That micro-service should be able to require “pet storage” resources, fulfilled by external storage and with help from the cluster. We also want to improve the tools, experience for how users can roll-out applications through patterns like canary deployments.
The system should be performant, especially from the perspective of micro-service running on top of the cluster and for Operations users. As part of being production grade, the system should have a measured availability and be resilient to failures, including fatal failures due to hardware.
In terms of performance, the objectives include: