diff --git a/README.md b/README.md index c042bc28..3bdecd09 100644 --- a/README.md +++ b/README.md @@ -2,554 +2,63 @@ # k6 Operator -`grafana/k6-operator` is a Kubernetes operator for running distributed k6 tests in your cluster. +`grafana/k6-operator` is a Kubernetes operator for running distributed [k6](https://github.com/grafana/k6) tests in your cluster. k6 Operator introduces two CRDs: -Read also the [complete tutorial](https://k6.io/blog/running-distributed-tests-on-k8s/) to learn more about how to use this project. +- `TestRun` CRD +- `PrivateLoadZone` CRD -## Setup +The `TestRun` CRD is a representation of a single k6 test executed once. `TestRun` supports various configuration options that allow you to adapt to different Kubernetes setups. You can find a description of the more common options [here](https://grafana.com/docs/k6/latest/set-up/set-up-distributed-k6/usage/common-options/), and the full list of options can be found in the [definition itself](https://github.com/grafana/k6-operator/blob/main/config/crd/bases/k6.io_testruns.yaml). -### Prerequisites +The `PrivateLoadZone` CRD is a representation of a [load zone](https://grafana.com/docs/grafana-cloud/testing/k6/author-run/use-load-zones/), which is a k6 term for a set of nodes within a cluster designated to execute k6 test runs. `PrivateLoadZone` is integrated with [Grafana Cloud k6](https://grafana.com/products/cloud/k6/) and requires a [Grafana Cloud account](https://grafana.com/auth/sign-up/create-user). You can find a guide describing how to set up a `PrivateLoadZone` [here](https://grafana.com/docs/grafana-cloud/testing/k6/author-run/set-up-private-load-zones/), while billing details can be found [here](https://grafana.com/docs/grafana-cloud/cost-management-and-billing/understand-your-invoice/k6-invoice/). -The minimal prerequisite for k6-operator is a Kubernetes cluster and access to it with [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl). +## Documentation -### Deploying the operator +You can find the latest k6 Operator documentation in the [Grafana k6 OSS docs](https://grafana.com/docs/k6/latest/set-up/set-up-distributed-k6/usage/common-options/). -#### Bundle deployment +For additional resources: -The easiest way to install the operator is with bundle: -```bash -curl https://raw.githubusercontent.com/grafana/k6-operator/main/bundle.yaml | kubectl apply -f - -``` - -Bundle includes default manifests for k6-operator, including `k6-operator-system` namespace and k6-operator Deployment with latest tagged Docker image. Customizations can be made on top of this manifest as needs be, e.g. with `kustomize`. - -#### Deployment with Helm - -Helm releases of k6-operator are published together with other Grafana Helm charts and can be installed with the following commands: - -```bash -helm repo add grafana https://grafana.github.io/helm-charts -helm repo update -helm install k6-operator grafana/k6-operator -``` - -Passing additional configuration can be done with `values.yaml` (example can be found [here](https://github.com/grafana/k6-operator/blob/main/charts/k6-operator/samples/customAnnotationsAndLabels.yaml)): - -```bash -helm install k6-operator grafana/k6-operator -f values.yaml -``` - -Complete list of options available for Helm can be found [here](https://github.com/grafana/k6-operator/blob/main/charts/k6-operator/README.md). - -#### Makefile deployment - -In order to install the operator with Makefile, the following additional tooling must be installed: -- [go](https://go.dev/doc/install) -- [kustomize](https://kubectl.docs.kubernetes.io/installation/kustomize/) - -A more manual, low-level way to install the operator is by running the command below: - -```bash -make deploy -``` - -This method may be more useful for development of k6-operator, depending on specifics of the setup. - -### Installing the CRD - -The k6-operator includes custom resources called `TestRun` and `PrivateLoadZone`. These will be automatically installed when you do a -deployment or install a bundle, but in case you want to do it yourself, you may run the command below: - -```bash -make install -``` - -> :warning: `K6` CRD has been substituted with `TestRun` CRD and will be deprecated in the future. - -## Usage - -Samples are available in `config/samples` and `e2e/`, both for `TestRun` and for `PrivateLoadZone`. - -### Adding test scripts - -The operator utilises `ConfigMap`s and `LocalFile` to serve test scripts to the jobs. To upload your own test script, run the following command to configure through `ConfigMap`: - -#### ConfigMap -```bash -kubectl create configmap my-test --from-file /path/to/my/test.js -``` - -***Note: there is a character limit of 1048576 bytes to a single configmap. If you need to have a larger test file, you'll need to use a volumeClaim or a LocalFile instead*** - -#### VolumeClaim - -There is a sample avaiable in `config/samples/k6_v1alpha1_k6_with_volumeClaim.yaml` on how to configure to run a test script with a volumeClaim. - -If you have a PVC with the name `stress-test-volumeClaim` containing your script and any other supporting file(s), you can pass it to the test like this: - -```yaml -spec: - parallelism: 2 - script: - volumeClaim: - name: "stress-test-volumeClaim" - file: "test.js" -``` - -***Note:*** the pods will expect to find script files in `/test/` folder. If `volumeClaim` fails, it's the first place to check: the latest initializer pod does not generate any logs and when it can't find the file, it will terminate with error. So missing file may not be that obvious and it makes sense to check it manually. See #143 for potential improvements. - -##### Example directory structure while using volumeClaim - -``` -├── test -│ ├── requests -│ │ ├── stress-test.js -│ ├── test.js -``` - -In the above example, `test.js` imports a function from `stress-test.js` and they would look like this: - -```js -// test.js -import stressTest from "./requests/stress-test.js"; - -export const options = { - vus: 50, - duration: '10s' -}; - -export default function () { - stressTest(); -} -``` - -```js -// stress-test.js -import { sleep, check } from 'k6'; -import http from 'k6/http'; - - -export default () => { - const res = http.get('https://test-api.k6.io'); - check(res, { - 'status is 200': () => res.status === 200, - }); - sleep(1); -}; -``` - -#### LocalFile - -There is a sample avaiable in `config/samples/k6_v1alpha1_k6_with_localfile.yaml` on how to configure to run a test script inside the docker image. - -***Note: if there is any limitation on usage of volumeClaim in your cluster you can use this option, but always prefer the usage of volumeClaim.*** - -### Executing tests - -Tests are executed by applying the custom resource `TestRun` to a cluster where the operator is running. The properties -of a test run are few, but allow you to control some key aspects of a distributed execution. - -```yaml -# k6-resource.yml - -apiVersion: k6.io/v1alpha1 -kind: TestRun -metadata: - name: k6-sample -spec: - parallelism: 4 - script: - configMap: - name: k6-test - file: test.js - separate: false - runner: - image: - metadata: - labels: - cool-label: foo - annotations: - cool-annotation: bar - securityContext: - runAsUser: 1000 - runAsGroup: 1000 - runAsNonRoot: true - resources: - limits: - cpu: 200m - memory: 1000Mi - requests: - cpu: 100m - memory: 500Mi - starter: - image: - metadata: - labels: - cool-label: foo - annotations: - cool-annotation: bar - securityContext: - runAsUser: 2000 - runAsGroup: 2000 - runAsNonRoot: true -``` - -The test configuration is applied using - -```bash -kubectl apply -f /path/to/your/k6-resource.yml -``` - -#### Parallelism -How many instances of k6 you want to create. Each instance will be assigned an equal execution segment. For instance, -if your test script is configured to run 200 VUs and parallelism is set to 4, as in the example above, the operator will -create four k6 jobs, each running 50 VUs to achieve the desired VU count. - -#### Script -The name of the config map that includes our test script. In the example in the [adding test scripts](#adding-test-scripts) -section, this is set to `my-test`. - -#### Separate -Toggles whether the jobs created need to be distributed across different nodes. This is useful if you're running a -test with a really high VU count and want to make sure the resources of each node won't become a bottleneck. - -#### Serviceaccount - -If you want to use a custom Service Account you'll need to pass it into both the starter and runner object: - -```yaml -apiVersion: k6.io/v1alpha1 -kind: TestRun -metadata: - name: -spec: - script: - configMap: - name: "" - runner: - serviceAccountName: - starter: - serviceAccountName: -``` - -#### Runner +- :book: A guide [Running distributed load tests on Kubernetes](https://grafana.com/blog/2022/06/23/running-distributed-load-tests-on-kubernetes/). +- :book: A guide [Running distributed tests](https://grafana.com/docs/k6/latest/testing-guides/running-distributed-tests/). +- :movie_camera: Grafana Office Hours [Load Testing on Kubernetes with k6 Private Load Zones](https://www.youtube.com/watch?v=RXLavQT58YA). -Defines options for the test runner pods. This includes: +Common samples are available in the `config/samples` and `e2e/` folders in this repo, both for the `TestRun` and `PrivateLoadZone` CRDs. -* passing resource limits and requests -* passing in labels and annotations -* passing in affinity and anti-affinity -* passing in a custom image +## Contributing -#### Starter +### Requests and feedback -Defines options for the starter pod. This includes: +We are always interested in your feedback! If you encounter problems during the k6 Operator usage, check out the [troubleshooting guide](https://grafana.com/docs/k6/latest/set-up/set-up-distributed-k6/troubleshooting/). If you have questions on how to use the k6 Operator, you can post them on the [Grafana community forum](https://community.grafana.com/c/grafana-k6/k6-operator/73). -* passing in custom image -* passing in labels and annotations +For new feature requests and bug reports, consider opening an issue in this repository. First, check the [existing issues](https://github.com/grafana/k6-operator/issues) in case a similar report already exists. If it does, add a comment about your use case or upvote it. -### k6 outputs +For bug reports, please use [this template](https://github.com/grafana/k6-operator/issues/new?assignees=&labels=bug&projects=&template=bug.yaml). If you think there is a missing feature, please use [this template](https://github.com/grafana/k6-operator/issues/new?assignees=&labels=enhancement&projects=&template=feat_req.yaml). -#### k6 Cloud output +### Development -k6 supports [output to its Cloud](https://k6.io/docs/results-visualization/cloud) with `k6 run --out cloud script.js` command. This feature is available in k6-operator as well for subscribed users. Note that it supports only `parallelism: 20` or less. + -To use this option in k6-operator, set the argument in yaml: +When submitting a PR, it's preferable to work on an open issue. If an issue does not exist, create it. An issue allows us to validate the problem, gather additional feedback from the community, and avoid unnecessary work. -```yaml -# ... - script: - configMap: - name: "" - arguments: --out cloud -# ... -``` - -Then, if you installed operator with bundle, create a secret with the following command: - -```bash -kubectl -n k6-operator-system create secret generic my-cloud-token \ - --from-literal=token= && kubectl -n k6-operator-system label secret my-cloud-token "k6cloud=token" -``` - -Alternatively, if you installed operator with Makefile, you can uncomment cloud output section in `config/default/kustomization.yaml` and copy your token from the Cloud there: - -```yaml -# Uncomment this section if you need cloud output and copy-paste your token -secretGenerator: -- name: cloud-token - literals: - - token= - options: - annotations: - kubernetes.io/service-account.name: k6-operator-controller - labels: - k6cloud: token -``` - -And re-run `make deploy`. - -This is sufficient to run k6 with the Cloud output and default values of `projectID` and `name`. For non-default values, extended script options can be used like this: - -```js -export let options = { - // ... - ext: { - loadimpact: { - name: 'Configured k6-operator test', - projectID: 1234567, - } - } -}; -``` - -### Cleaning up between test runs -After completing a test run, you need to clean up the test jobs created. This is done by running the following command: -```bash -kubectl delete -f /path/to/your/k6-resource.yml -``` - -### Multi-file tests - -In case your k6 script is split between more than one JS file, you can simply create a configmap with several data entries like this: -```bash -kubectl create configmap scenarios-test --from-file test.js --from-file utils.js -``` - -If there are too many files to specify manually, kubectl with folder might be an option: -```bash -kubectl create configmap scenarios-test --from-file=./test -``` - -Alternatively, you can create an archive with k6: -```bash -k6 archive test.js [args] -``` - -The above command will create an archive.tar in your current folder unless `-O` option is used to change the name of the output archive. Then it is possible to put that archive into configmap similarly to JS script: -```bash -kubectl create configmap scenarios-test --from-file=archive.tar -``` - -In case of using an archive it must be additionally specified in your yaml for TestRun deployment: - -```yaml -# ... -spec: - parallelism: 1 - script: - configMap: - name: "crocodile-stress-test" - file: "archive.tar" # <-- change here -``` - -In other words, `file` option must be the correct entrypoint for `k6 run`. - -### Using extensions -By default, the operator will use `grafana/k6:latest` as the container image for the test jobs. -If you want to use [extensions](https://k6.io/docs/extensions/get-started/explore/) built with [xk6](https://github.com/grafana/xk6) you'll need to create your own image and override the `image` property on the `TestRun` kubernetes resource. - -For example, create a `Dockerfile` with the following content: - -```Dockerfile -# Build the k6 binary with the extension -FROM golang:1.20 as builder - -RUN go install go.k6.io/xk6/cmd/xk6@latest -# For our example, we'll add support for output of test metrics to InfluxDB v2. -# Feel free to add other extensions using the '--with ...'. -RUN xk6 build \ - --with github.com/grafana/xk6-output-influxdb@latest \ - --output /k6 - -# Use the operator's base image and override the k6 binary -FROM grafana/k6:latest -COPY --from=builder /k6 /usr/bin/k6 -``` -Build the image based on this `Dockerfile` by executing: -```bash -docker build -t k6-extended:local . -``` -Once the build is completed, push the resulting `k6-extended:local` image to an image repository accessible to your Kubernetes cluster. -We can now use it as follows: - -```yaml -# k6-resource-with-extensions.yml - -apiVersion: k6.io/v1alpha1 -kind: TestRun -metadata: - name: k6-sample-with-extensions -spec: - parallelism: 4 - script: - configMap: - name: crocodile-stress-test - file: test.js - runner: - image: k6-extended:local - env: - - name: K6_OUT - value: xk6-influxdb=http://influxdb.somewhere:8086/demo -``` - -Note that we are overriding the default image with `k6-extended:latest`, providing the test runner with environment variables used by our included extensions. - -### Scheduling Tests - -While the k6 operator doesn't support scheduling k6 tests directly, the recommended path for scheduling tests is to use the cronjobs object from k8s directly. The cron job should run on a schedule and run a delete and then apply of a k6 object - -Running these tests requires a little more setup, the basic steps are: + -1. Create a configmap of js test files (Covered above) -1. Create a configmap of the yaml for the k6 job -1. Create a service account that lets k6 objects be created and deleted -1. Create a cron job that deletes and applys the yaml +There are many options for setting up a local Kubernetes cluster for development, and any of them can be used for local development of the k6 Operator. One option is to create a [kind cluster](https://kind.sigs.k8s.io/docs/user/quick-start/). -Add a configMapGenerator to the kustomization.yaml: - -```yaml -configMapGenerator: - - name: -config - files: - - .yaml -``` - -Then we are going to create a service account for the cron job to use: - -This is required to allow the cron job to actually delete and create the k6 objects. - -```yaml ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: Role -metadata: - name: k6- -rules: - - apiGroups: - - k6.io - resources: - - testruns - verbs: - - create - - delete - - get - - list - - patch - - update - - watch ---- -kind: RoleBinding -apiVersion: rbac.authorization.k8s.io/v1 -metadata: - name: k6- -roleRef: - kind: Role - name: k6- - apiGroup: rbac.authorization.k8s.io -subjects: - - kind: ServiceAccount - name: k6- - namespace: ---- -apiVersion: v1 -kind: ServiceAccount -metadata: - name: k6- -``` - -We're going to create a cron job: - -```yaml -# snapshotter.yml -apiVersion: batch/v1beta1 -kind: CronJob -metadata: - name: -cron -spec: - schedule: "" - concurrencyPolicy: Forbid - jobTemplate: - spec: - template: - spec: - serviceAccount: k6 - containers: - - name: kubectl - image: bitnami/kubectl - volumeMounts: - - name: k6-yaml - mountPath: /tmp/ - command: - - /bin/bash - args: - - -c - - "kubectl delete -f /tmp/.yaml; kubectl apply -f /tmp/.yaml" - restartPolicy: OnFailure - volumes: - - name: k6-yaml - configMap: - name: -config -``` +Additionally, you'll need to install the following tooling: -### Namespaced deployment - -By default, k6-operator watches `TestRun` and `PriaveLoadZone` custom resources in all namespaces. But it is possible to configure k6-operator to watch only a specific namespace by setting a `WATCH_NAMESPACE` environment variable for the operator's deployment: - -```yaml -apiVersion: apps/v1 -kind: Deployment -metadata: - name: k6-operator-controller-manager - namespace: k6-operator-system -spec: - template: - spec: - containers: - - name: manager - image: ghcr.io/grafana/k6-operator:controller-v0.0.14 - env: - - name: WATCH_NAMESPACE - value: "some-ns" -# ... -``` +- [Golang](https://go.dev/doc/install) +- [kustomize](https://kubectl.docs.kubernetes.io/installation/kustomize/) +- [operator-sdk](https://sdk.operatorframework.io/docs/installation/): optional as most common changes can be done without it -## Uninstallation +To execute unit tests, use these commands: -You can remove the all resources created by the operator with bundle: ```bash -curl https://raw.githubusercontent.com/grafana/k6-operator/main/bundle.yaml | kubectl delete -f - +make test-setup # only need to run once +make test ``` -Or with `make` command: -```bash -make delete -``` - -## Developing Locally - -### Pre-Requisites - -- [operator-sdk](https://sdk.operatorframework.io/docs/installation/) -- [kustomize](https://kubectl.docs.kubernetes.io/installation/kustomize/) - -### Run Tests - -#### Test Setup - -- `make test-setup` (only need to run once) - -#### Run Unit Tests +To execute e2e test locally: -- `make test` - -#### Run e2e Tests - -- [install kind and create a k8s cluster](https://kind.sigs.k8s.io/docs/user/quick-start/) (or create your own dev cluster) -- `make e2e` for kustomize and `make e2e-helm` for helm +- `make e2e` for kustomize and `make e2e-helm` for Helm - validate tests have been run - `make e2e-cleanup` - -## See also - -- [Running distributed k6 tests on Kubernetes](https://k6.io/blog/running-distributed-tests-on-k8s/) diff --git a/docs/troublehooting.md b/docs/troublehooting.md deleted file mode 100644 index e6680b6f..00000000 --- a/docs/troublehooting.md +++ /dev/null @@ -1,243 +0,0 @@ -# Troubleshooting - -Just as any Kubernetes application, k6-operator can get into error scenarios which are sometimes a result of a misconfigured test or setup. This document is meant to help troubleshoot such scenarios quicker. - -## Common tricks - -### Preparation - -> [!IMPORTANT] -> Before trying to run a script with k6-operator, be it via `TestRun` or via `PrivateLoadZone`, always run it locally: -> -> ```bash -> k6 run script.js -> ``` - -If there are going to be environment variables or CLI options, pass them in as well: -```bash -MY_ENV_VAR=foo k6 run script.js --tag my_tag=bar -``` - -This ensures that the script has correct syntax and can be parsed with k6 in the first place. Additionally, running locally will make it obvious if the configured options are doing what is expected. If there are any errors or unexpected results in the output of `k6 run`, make sure to fix those prior to deploying the script elsewhere. - -### `TestRun` deployment - -#### The pods - -In case of one `TestRun` Custom Resource (CR) creation with `parallelism: n`, there are certain repeating patterns: - -1. There will be `n + 2` Jobs (with corresponding Pods) created: initializer, starter, `n` runners. -1. If any of these Jobs did not result in a Pod being deployed, there must be an issue with that Job. Some commands that can help here: - ```bash - kubectl get jobs -A - kubectl describe job mytest-initializer - ``` -1. If one of the Pods was deployed but finished with `Error`, it makes sense to check its logs: - ```bash - kubectl logs mytest-initializer-xxxxx - ``` - -If the Pods seem to be working but not producing an expected result and there's not enough information in the logs of the Pods, it might make sense to turn on k6 [verbose option](https://grafana.com/docs/k6/latest/using-k6/k6-options/#options) in `TestRun` spec: - -```yaml -apiVersion: k6.io/v1alpha1 -kind: TestRun -metadata: - name: k6-sample -spec: - parallelism: 2 - script: - configMap: - name: "test" - file: "test.js" - arguments: --verbose -``` - -#### k6-operator - -Another source of info is k6-operator itself. It is deployed as a Kubernetes `Deployment`, with `replicas: 1` by default, and its logs together with observations about the Pods from [previous subsection](#the-pods) usually contain enough information to glean correct diagnosis. With the standard deployment, the logs of k6-operator can be checked with: - -```bash -kubectl -n k6-operator-system -c manager logs k6-operator-controller-manager-xxxxxxxx-xxxxx -``` - -#### Inspect `TestRun` resource - -One `TestRun` CR is deployed, it can be inspected the same way as any other resource: - -```bash -kubectl describe testrun my-testrun -``` - -Firstly, check if the spec is as expected. Then, see the current status: - -```yaml -Status: - Conditions: - Last Transition Time: 2024-01-17T10:30:01Z - Message: - Reason: CloudTestRunFalse - Status: False - Type: CloudTestRun - Last Transition Time: 2024-01-17T10:29:58Z - Message: - Reason: TestRunPreparation - Status: Unknown - Type: TestRunRunning - Last Transition Time: 2024-01-17T10:29:58Z - Message: - Reason: CloudTestRunAbortedFalse - Status: False - Type: CloudTestRunAborted - Last Transition Time: 2024-01-17T10:29:58Z - Message: - Reason: CloudPLZTestRunFalse - Status: False - Type: CloudPLZTestRun - Stage: error -``` - -If `Stage` is equal to `error` then it definitely makes sense to check the logs of k6-operator. - -Conditions can be used as a source of info as well, but it is a more advanced troubleshooting option that should be used if previous suggestions are insufficient. Note, that conditions that start with `Cloud` prefix matter only in the setting of k6 Cloud test runs, i.e. cloud output and PLZ test runs. - -### `PrivateLoadZone` deployment - -If `PrivateLoadZone` CR was successfully created in Kubernetes, it should become visible in your account in Grafana Cloud k6 (GCk6) interface soon afterwards. If it doesn't appear in the UI, then there is likely a problem to troubleshoot. - -Firstly, go over the [guide](https://grafana.com/docs/grafana-cloud/k6/author-run/private-load-zone-v2/) to double-check if all the steps have been done correctly and successfully. - -Unlike `TestRun` deployment, when `PrivateLoadZone` is first created, there are no additional resources deployed. So the only source for troubleshooting are the logs of k6-operator. See the [above subsection](#k6-operator) on how to access its logs. Any errors there might be a hint to what is wrong. See [below](#privateloadzone-subscription-error) for some potential errors explained in more detail. - -### Running tests in `PrivateLoadZone` - -Each time a user runs a test in a PLZ, for example with `k6 cloud script.js`, there is a corresponding `TestRun` being deployed by k6-operator. This `TestRun` will be deployed in the same namespace as its `PrivateLoadZone`. If such test is misbehaving (errors out, does not produce expected result, etc.), then one should check: -1) if there are any messages in GCk6 UI -2) if there are any messages in the output of `k6 cloud` command -3) the resources and their logs, the same way as with [standalone `TestRun` deployment](#testrun-deployment) - -## Common scenarios - -### Where are my env vars... - -Some tricky cases with environment variables are described in [this doc](./env-vars.md). - -### Tags are not working?! - -Currently, tags are a rather common source of frustration in usage of k6-operator. For example: - -```yaml - arguments: --tag product_id="Test A" - # or - arguments: --tag foo=\"bar\" -``` - -Passing the above leads to parsing errors which can be seen in the logs of either initializer or runner Pod, e.g.: -```bash -time="2024-01-11T11:11:27Z" level=error msg="invalid argument \"product_id=\\\"Test\" for \"--tag\" flag: parse error on line 1, column 12: bare \" in non-quoted-field" -``` - -This is a standard problem with escaping the characters, and there's even an [issue](https://github.com/grafana/k6-operator/issues/211) that can be upvoted. - -### Initializer logs an error but it's not about tags - -Often, this happens because of lack of attention to the [preparation](#preparation) step. One more command that can be tried here is to run the following: - -```bash -k6 inspect --execution-requirements script.js -``` - -This command is a shortened version of what initializer Pod is executing. If the above command produces an error, it is definitely a problem with the script and should be first solved outside of k6-operator. The error itself may contain a hint to what is wrong, for instance a syntax error. - -If standalone `k6 inspect --execution-requirements` executes successfully, then it's likely a problem with `TestRun` deployment specific to your Kubernetes setup. Recommendations here: -- read carefully the output in initializer Pod: is it logged by k6 process or by something else? - - :information_source: k6-operator expects initializer logs to contain only the output of `k6 inspect`. If there's any other log line present, then k6-operator will fail to parse it and the test will not start. ([issue](https://github.com/grafana/k6-operator/issues/193)) -- check events in initializer Job and Pod as they may contain another hint about what is wrong - -### Non-existent ServiceAccount - -ServiceAccount can be defined as `serviceAccountName` and `runner.serviceAccountName` in PrivateLoadZone and TestRun CRD respectfully. If the specified ServiceAccount does not exist, k6-operator will successfully create Jobs but corresponding Pods will fail to be deployed, and k6-operator will wait indefinitely for Pods to be `Ready`. This error can be best seen in the events of the Job: - -```bash -kubectl describe job plz-test-xxxxxx-initializer -... -Events: - Warning FailedCreate 57s (x4 over 2m7s) job-controller Error creating: pods "plz-test-xxxxxx-initializer-" is forbidden: error looking up service account plz-ns/plz-sa: serviceaccount "plz-sa" not found -``` - -Currently, k6-operator does not try to analyze such scenarios on its own but we have an [issue](https://github.com/grafana/k6-operator/issues/260) for improvement. - -How to fix: incorrect `serviceAccountName` must be corrected and TestRun or PrivateLoadZone resource must be re-deployed. - -### Non-existent `nodeSelector` - -`nodeSelector` can be defined as `nodeSelector` and `runner.nodeSelector` in PrivateLoadZone and TestRun CRD respectfully. - -This case is very similar to [ServiceAccount one](#non-existent-serviceaccount): the Pod creation will fail, only the error would be somewhat different: - -```bash -kubectl describe pod plz-test-xxxxxx-initializer-xxxxx -... -Events: - Warning FailedScheduling 48s (x5 over 4m6s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. -``` - -How to fix: incorrect `nodeSelector` must be corrected and TestRun or PrivateLoadZone resource must be re-deployed. - -### Insufficient resources - -A related problem can happen when the cluster does not have sufficient resources to deploy the runners. There is a higher probability of hitting this issue when setting small CPU and memory limits for runners or using options like `nodeSelector`, `runner.affinity` or `runner.topologySpreadConstraints`, and not having a set of nodes matching the spec. Alternatively, it can happen if there is a high number of runners required for the test (via `parallelism` in TestRun or during PLZ test run) and autoscaling of the cluster has limits on maximum number of nodes and cannot provide the required resources on time or at all. - -This case is somewhat similar to the previous two: the k6-operator will wait indefinitely and can be monitored with events in Jobs and Pods. If it is possible to fix the issue with insufficient resources on-the-fly, e.g. by adding more nodes, k6-operator will attempt to continue executing a test run. - -### OOM of a runner Pod - -If there's at least one runner Pod that OOM-ed, the whole test will be [stuck](https://github.com/grafana/k6-operator/issues/251) and will have to be deleted manually: - -```bash -kubectl -f my-test.yaml delete -# or -kubectl delete testrun my-test -``` - -In case of OOM, it makes sense to review k6 script to understand what kind of resource usage this script requires. It may be that the k6 script can be improved to be more performant. Then, set `spec.runner.resources` in TestRun CRD or `spec.resources` in PrivateLoadZone CRD accordingly. - -### PrivateLoadZone: subscription error - -If there's something off with your k6 Cloud subscription, there will be a 400 error in the logs with the message detailing the problem. For example: - -```bash -"Received error `(400) You have reached the maximum Number of private load zones your organization is allowed to have. Please contact support if you want to create more.`. Message from server ``" -``` - -The most likely course of action in this case is either to check your organization settings in GCk6 or to contact k6 Cloud support. - -### PrivateLoadZone: wrong token - -There can be two major problems with the token. - -1. If token was not created or was created in a wrong location, there will be the following in the logs: - ```bash - Failed to load k6 Cloud token {"namespace": "plz-ns", "name": "my-plz", "reconcileID": "67c8bc73-f45b-4c7f-a9ad-4fd0ffb4d5f6", "name": "token-with-wrong-name", "secretNamespace": "plz-ns", "error": "Secret \"token-with-wrong-name\" not found"} - ``` - -2. If token contains a corrupted value or it's not an organizational token, there will be the following error in the logs: - ```bash - "Received error `(403) Authentication token incorrect or expired`. Message from server ``" - ``` - -### PrivateLoadZone: networking setup - -If you see any dial or connection errors in the logs of k6-operator, it makes sense to double-check the networking setup. For PrivateLoadZone to operate, outbound traffic to k6 Cloud [must be allowed](https://grafana.com/docs/grafana-cloud/k6/author-run/private-load-zone-v2/#before-you-begin). The basic way to check the reachability of k6 Cloud endpoints: - -```bash -kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml -kubectl exec -it dnsutils -- nslookup ingest.k6.io -kubectl exec -it dnsutils -- nslookup api.k6.io -``` - -For more resources on troubleshooting networking, see Kubernetes [official docs](https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/). - -### PrivateLoadZone: insufficient resources - -The problem is similar to [insufficient resources in general case](#insufficient-resources). But when running a PrivateLoadZone test, k6-operator will wait only for a timeout period (10 minutes at the moment). When the timeout period is up, the test will be aborted by k6 Cloud and marked as such both in PrivateLoadZone and in GCk6. In other words, there is a time limit to fix this issue without restarting the test run. \ No newline at end of file