This application tests the features of Kubernetes periodically in order to collect data and highlight symptoms of underlying issues.
Performance metrics are captured by each test using statsd to measure the effectiveness of the cluster and help highlight problems with it. See Kubernetes Manifests for detail on how to deploy this. Contributing includes a diagram of how this works under the hood and how people can contribute to this project.
The following environment variables can be used to configure the e2etest container.
Name | description | default |
---|---|---|
TIME_TO_REPORT_PROBLEM | Used on healthcheck page. Time in minutes to wait before displaying an error indicating how long it's been since the last test was ran. | 20 |
SECONDS_BETWEEN_RUNS | Time to sleep between runs of the test suite. | 0.0 |
TEST_NAMESPACE | Namespace to use during testing. All other resources will reside in this namespace. | kubee2etests |
TEST_DEPLOYMENT | Deployment name to create during testing. | kubee2etestapp |
TEST_SERVICE | Service name to create during testing | kubee2etests |
FLASK_PORT | Port on which to run flask app | 8081 |
STATSD_PORT | Port on which statsd is running |
8125 |
LOG_LEVEL | log level for test runner | INFO |
DOCKER_REGISTRY_HOST | Host from which to pull nginx pod for deployment based tests. We allow only configuration of the host, not the image because service and get requests expect that we'll be able to get something from an nginx web server. | `` |
CUSTOM_TEST_DEPLOYMENT_LABELS | Dictionary of key-value pairs to add to the labels applied to every test deployment. Will already be labelled with app: hello-minikube |
'{}' |
Manifests required to run this app completely headless are in ./manifests
. In addition to this the ./contrib
directory contains some manifests you may find useful:
./contrib/monitoring-config.yaml
contains configuration for monitoring. There are comments to indicate what each entry does../contrib/frontend.yaml
contains configuration for running the frontend status dashboard. Optionally, it includes a certificate resource for use withkube-cert-manager
. The ingress host name (and certificate domain) needs changing to match what you use on your cluster.
It's recommended you download the releases zip and apply the manifests from there, rather than the repo directly, as the e2etest tag is set using semantic versioning git attributes.
All metrics created by this application are prefixed by e2etest.
and are measured using the Statsd client library. For more information on Statsd metric types please see the statsd project repo. We use a Statsd -> Prometheus bridge when deploying this because:
- There are multiple tests running in multiple containers, so we can't use the prometheus client library which has a server running on a separate thread
- We need to dedupe any metrics which are collected by multiple containers
Time based metrics are bucketed into the statsd metric e2etest.action.<namespace>.<resource>.<action>
.
Actions:
- create
- update
- scale
- delete
- http_get
Resources are any of those mentioned in the test list below. Note that http_get only applies to Services and Pods, and scale only applies to deployments.
Request timings are sent to the time based metric bucket e2etest.action.<namespace>.<resource>.http_get
.
Results of HTTP requests are sent to the counter metric e2etest.http.<namespace>.<resource>.<result>
.
Resource can be:
- service
- pod
Result can be:
- any HTTP status code
- "wrong_response" meaning the response text didn't match what was expected
Errors are counted using the statsd metric e2etest.errors.<namespace>.<resource>.<area>.<error>
.
Given that there could be multiple categories of error generated by this application, see the troubleshooting doc for more detail on the different labels associated with e2etest.errors
.
The application provides a dashboard to quickly view the results of the last run of tests. The display shows the namespace in which the test is running, test name, status (passing or failing), info if the test is failing, and a timestamp of when the test was last run.
This is intended to be a first port of call for identifying problems with the platform, supplemented by metrics from Prometheus, and has an ingress so you can view it at status.<your domain here>
To enable this view you need the manifests in contrib/frontend.yaml
- example below
- create a namespace (name set by environment variable, or defaulted to kubee2etests)
- check namespace exists
- check namespace is empty
- Delete the namespace
- check namespace is deleted
- create service in namespace
- check service exists and has 0 endpoints
- check service has N endpoints following deployment creation
- check service has 0 endpoints following deployment scaling
- delete service
- check service is deleted
- create deployment (some simple http container), N (default: 3) instances, DC anti affinity enabled
- check pods have been scheduled
- check pods have been started
- check each pod is on a different node
- check each pod's node is in a different data centre
- delete deployment
- update deployment to a different image
- Check old pods are deleted
- validate that all pods update to new image
- scale deployment to 0
- check pods are marked for deletion
- check pods are deleted
- make http request to each pod individually
- check each pod log output non empty
- check pod log output includes http request to service/pod
- make http request to service
- check http request to each pod gets expected response
- check http request to service gets expected response
- make http request to service address N*2 times, get expected responses
- make http request to each pod individually after update, get expected response
- make http request to service address N*2 times after update, get expected responses
The DNS test attempts to resolve a well known name within the cluster from the test container itself - we added this because we found that DNS pods could be up but due to various issues DNS was not actually working at all.
The main script, kubee2etests/scripts/test_runner.py
runs a different selection of the above tests based on a command line argument suite
. It can be ran locally using python3 kubee2etests/scripts/test_runner.py <suite>
using any of the test names below.
Test suite name | tasks |
---|---|
namespace | Create and check the namespace |
deployment | Deployment creation and pod check tests |
deployment_update | Create a deployment if it's not there, deployment update tests |
deployment_scale | Create a deployment if it's not there, deployment scaling tests |
service | Namespace, service creation tests |
deployment_service | Create a service if it's not there, create a deployment if it's not there, service endpoints correct tests |
deployment_scale_service | Create a service if it's not there, create a deployment if it's not there, scale the deployment, service endpoints correct tests |
http | Create a service if it's not there, create a deployment if it's not there, HTTP request tests |
http_update | Create a service if it's not there, create a deployment if it's not there, deployment update tests, HTTP request tests |
dns | Attempt to resolve name, report healthy if passed, failed if failed. |
Excluding the DNS test (which has no namespace) and the namespace test, all tests assume the namespace defined in the environment variable TEST_NAMESPACE
will exist when they start - if the namespace does not exist, the test will quit.