Documentation for the Teacher services application developers
- There is an assumption that you have been given a CIP account. For BYOD users, please make sure to request a digitalauth account.
- The technical lead of your team will then add you to the AD group of your area. For example if you work on a BAT service, you will be added to "s189 BAT delivery team". You will now be able to:
- Access (read-only) the s189 subscriptions in the Azure portal
- Access (read-write) to your test Kubernetes namespaces and Azure resource groups in the test subscription
- Elevate your permissions via PIM and access (read-write) temporarily the production Kubernetes namespaces and Azure resource groups
- Approve other developers' PIM requests
Microsoft Entra Privileged Identity Management (PIM) allows gaining temporary (up to 8h) user permissions to access production resources. This is sometimes required to access the Kubernetes cluster and troubleshoot the application or database.
- Use PIM for groups to elevate your access. You should see the PIM group of your area. For example if you work on a BAT service, you should see: "s189 BAT production PIM".
- Click "Activate", select the time and give a brief justification, which is important to gain approval and audit purpose.
- The other members of the team will receive an email with a link to PIM so they can review and approve your request.
- After a few minutes, your access will be active. It may require login out and in again.
The infra team maintains several AKS clusters. Two are usable by developers to deploy their services:
Used for all your non-production environments: review, development, qa, staging...
- Name:
s189t01-tsc-test-aks
- Resource group:
s189t01-tsc-ts-rg
- Subscription:
s189-teacher-services-cloud-test
Used for all your production and production-like environments, especially if they contain production data: production, pre-production, production-data...
- Name:
s189p01-tsc-production-aks
- Resource group:
s189p01-tsc-pd-rg
- Subscription:
s189-teacher-services-cloud-production
- If not present in your repository, set up the
get-cluster-credentials
make command from the template Makefile. - If the environment is production, raise a PIM request
- Login to azure command line using
az login
oraz login --use-device-code
- Run
make <environment> get-cluster-credentials
- This configures the
kubectl
context so you can run commands against this cluster
Namespaces are a way to logically partition and isolate resources within a Kubernetes cluster. Each namespace has its own set of isolated resources like pods, services, deployments etc. By default, a Kubernetes cluster will have a few initial namespaces created like "default", "kube-system", "kube-public" etc. We have created specific namespaces per area, such as "BAT" or "TRA". For instance, you will see:
- tra-development and tra-staging on the test cluster
- tra-production on the production cluster
Here is the full list of namespaces in the test cluster and in the production cluster.
kubectl commands run in a particular namespace using -n <namespace>
.
First get access to the desired cluster. Then you can run commands using kubectl against different kubernetes resources.
Allows you to specify the desired state of your application. It allows you to deploy multiple pods and services and manage them as a single entity. It also allows you to do rolling updates and rollbacks.
Examples kubectl deployment usage:
- List deployments in a namespace:
kubectl -n <namespace> get deployments
- Get configuration and status:
kubectl -n <namespace> describe deployment <deployment-name>
- Scale deployment horizontally:
kubectl -n <namespace> scale deployment <deployment-name> --replicas=3
Each deployment runs 1 or more instances of the application to scale horizontally. Each one runs in a pod, which is ephemeral and can be deleted or recreated at any time. Deployments provide a way to keep pods running and provide a way to update them when needed.
Examples kubectl pod usage:
- List pods in a namespace:
kubectl -n <namespace> get pods
- Get pod configuration and status:
kubectl -n <namespace> describe pod <pod-name>
- Get pod logs:
kubectl -n <namespace> logs <pod-name>
- Get logs from the first pod in the deployment:
kubectl -n <namespace> logs deployment/<deployment-name>
- Stream logs from all pods in the deployment:
kubectl -n <namespace> logs -l app=<deployment-name> -f
- Display CPU and memory usage:
kubectl -n <namespace> top pods
- Execute a command inside a pod:
kubectl -n <namespace> exec <pod-name> -- <command>
- Execute a command inside the first pod in the deployment:
kubectl -n <namespace> exec deployment/<deployment-name> -- <command>
- Open an interactive shell inside a pod:
kubectl -n <namespace> exec -ti <pod-name> -- sh
All HTTP requests enter the cluster via the ingress controller. Then it sends them to the relevant pods. We can observe the HTTP traffic to a particular deployment.
- Deployment filter:
<namespace>-<deployment-name>-80
e.g.bat-qa-register-qa-80
- Stream logs from all ingress controllers and filter on the deployment:
kubectl logs -l app.kubernetes.io/name=ingress-nginx -f --max-log-requests 20 | grep <deployment-filter>
The standard output from all applications is captured in Azure Log analytics and stored for 30 days. As opposed to kubectl logs
which only show the most recent logs. There is one Log analytics workspace per cluster:
- Navigate to the log analytics workspace:
- Click on Logs
- Select the time range, as small as possible
- Application logs are in the ContainerInsights/ContainerLog table, and the standard output is in the LogEntry Column
All logs from all the services on the cluster:
ContainerLogV2
Full text search for "Exception":
ContainerLogV2
| where LogEntry contains "Exception"
Decode the LogEntry json to query it:
ContainerLogV2
| extend log_entry = parse_json(LogEntry)
| where log_entry.host contains "register"
| where log_entry.environment == "production"
Only show the timestamp and LogEntry columns:
ContainerLogV2
| extend log_entry = parse_json(LogEntry)
| where log_entry.host contains "register"
| project TimeGenerated, log_entry
HTTP requests from the ingress controller, using the filter from ingress controller logs:
ContainerLogV2
| where LogEntry contains "cpd-production-cpd-ecf-production-web-80"
The main monitoring tools used are Grafana and Alertmanager. For further reading about Monitoring setup in the cluster click here.
Grafana could be accessed via the respective URLs based on the environment of interest. The URLs corresponding to each environment as below:
- Test | https://grafana.test.teacherservices.cloud
- Production | https://grafana.teacherservices.cloud/
The default access to the grafana interface is view only, which does not require authentication. In order to be able to make changes for example adding more dashboards and editing existing dashboards, requests will have to be made in the #teacher-services-infra slack channel to obtain admin credentials.
Grafana allows you to export your dashboard as a JSON file, which can be version controlled and shared with others. This could be achieved by following these steps:
- Open your dashboard in Grafana
- Click on the "Share" button(icon) in the top left corner
- In the "Export" tab, select "Export for sharing externally"
- Click "Save to file" to download the JSON file of your dashboard
The following steps are required for creating or editing dashboards. Please click for more extensive details
- Ensure you are logged in as an admin
- Identify the purpose of your dashboard. What insights the dashboard will provide and what messages it conveys
- Plan and Design how the dashboard would look when completed, paying attention to the placement of panels, alignment, spacing, colour and organisation
- Select the Appropriate Data Sources by identifying the right datasource to visualise in the dashboard (currently prometheus is the only datasource available to select )
- Click on the "Explore" view, select the datasource(prometheus) and then browse and search using the "Metric" dropdown
- Create Panels for each metric by adding panel for the metric and choosing the right visualisation (for example graph, gauge, table, heatmap) and configure the panel settings eg the query, data transformation and display options and add concise title for clarity
- Any changes made to the dashboard on the UI will be overwritten in the next deployment unless added to the codebase and a pull request made to merge it
- In order to ensure that the new dashboard created is permanent and not deleted by subsequent deployment, add a JSON file to the dashboards directory here by pasting the content of the json file exported from the dashboard and then make an entry to grafana_dashboards kubernetes_config_map resource in grafana.tf file and raise a PR in order to merge the change
- Log in to Grafana as admin
- Navigate to the Dashboard Import Page and click the "+" icon in the left sidebar to open the dashboard menu, select "Import" from the
dropdown menu to access the dashboard import page. - Import the JSON File by either clicking on "Upload JSON file" and selecting the json file from your computer or pasting the json file content into the text area provided
- Click on the "Import" button to initiate the dashboard import process
The alertmanager urls corresponding to the various environments are
- Test | https://alertmanager.test.teacherservices.cloud/
- Production | https://alertmanager.teacherservices.cloud/
Authentication details are usually required but this is stored in the keyvault. Please ask in the #teacher-services-infra channel for more details.