-
Creating a Kubernetes cluster with CPU and GPU nodes.
-
Installing the required Nvidia Gpu Operator and Network Operator for running GPU workloads.- Installing Grafana.
-
Installing Prometheus.
-
Installing Loki.
-
Installing Promtail.
-
Install Nebius CLI:
curl -sSL https://storage.ai.nebius.cloud/nebius/install.sh | bash
-
Reload your shell session:
exec -l $SHELL
or
source ~/.bashrc
-
Configure Nebius CLI (it is recommended to use service account for configuration)
-
Install JQuery:
- MacOS:
brew install jq
- Debian based distributions:
sudo apt install jq -y
- MacOS:
To deploy a Kubernetes cluster, follow these steps:
-
Load environment variables:
source ./environment.sh
-
Initialize Terraform:
terraform init
-
Replace the placeholder content in
terraform.tfvars
with configuration values that meet your specific requirements. See the details below. -
Preview the deployment plan:
terraform plan
-
Apply the configuration:
terraform apply
Wait for the operation to complete.
These are the basic configurations required to deploy Kubernetes for training in Nebius AI. Edit the configurations as necessary in the terraform.tfvars
file.
Additional configurable variables can be found in the variables.tf
file.
# Cloud environment and network
parent_id = "" # The project-id in this context
subnet_id = "" # Run the `nebius vpc v1alpha1 network list` command to see the subnet id
region = "" # The project region
ssh_user_name = "" # Username you want to use to connect to the nodes
ssh_public_key = {
key = "Enter your public SSH key here" OR
path = "Enter the path to your SSH key here"
}
# K8s modes
cpu_nodes_count = 3 # Number of CPU nodes
cpu_nodes_preset = "16vcpu-64gb" # CPU node preset
gpu_nodes_count = 1 # Number of GPU nodes
gpu_nodes_preset = "8gpu-128vcpu-1600gb" # The GPU node preset. Only nodes with 8 GPU can be added to gpu cluster with infiniband connection.
# Observability
enable_grafana = true # Enable or disable Grafana deployment with true or false
enable_prometheus = true # Enable or disable Prometheus deployment with true or false
enable_loki = true # Enable or disable Loki deployment with true or false
enable_dcgm = true # Enable or disable NVIDIA DCGM Exporter Dashboard and Alerting deployment using true or false
## Loki
loki_access_key_id = "" # See README.md for instructions. Leave empty if you are not deploying Loki.
loki_secret_key = "" # See the instruction in README.md on how to create this. If you are not deploying Loki, leave it empty.
See the details below for more information on Grafana, Prometheus, Loki and NVIDIA DCGM.
To deploy Loki, you will need to create a service account. See the instructions here.
# Storage
## Filestore - recommended
enable_filestore = true # Enable or disable Filestore integration with true or false
filestore_disk_size = 100 * (1024 * 1024 * 1024) #Set the Filestore disk size in bytes. The multiplication makes it easier to set the size in GB, giving you a total of 100 GB
filestore_block_size = 4096 # Set the Filestore block size in bytes
## GlusterFS - legacy
enable_glusterfs = false # Enable or disable GlusterFS integration with true or false
glusterfs_storage_nodes = 3 # Set the number of storage nodes in the GlusterFS cluster
glusterfs_disk_count_per_vm = 2 # Set the number of disks per storage node in the GlusterFS cluster
glusterfs_disk_size = 100 * (1024 * 1024 * 1024) #Set the disk size in bytes. The multiplication makes it easier to set the size in GB, giving you a total of 100 GB.
There are two ways to add external storage to K8s clusters:
- Filestore (recommended, enabled by default)
- GlusterFS (legacy)
Both options allow you to create a Read-Write-Many HostPath PVCs in a K8s cluster. Use the following paths: /mnt/filestore
for Filestore, /mnt/glusterfs
for
GlusterFS.
For more information on how to access storage in K8s, refer here.
- Install kubectl (instructions)
- Install the Nebius AI CLI (instructions)
- Install jq (instructions)
- Perform the following command from the terraform deployment folder:
nebius mk8s v1 cluster get-credentials --id $(cat terraform.tfstate | jq -r '.resources[] | select(.type == "nebius_mk8s_v1_cluster") | .instances[].attributes.id') --external
-
Run the following command from the terraform deployment folder:
nebius mk8s v1 cluster get-credentials --id $(cat terraform.tfstate | jq -r '.resources[] | select(.type == "nebius_mk8s_v1_cluster") | .instances[].attributes.id') --external
-
Verify the kubectl configuration after adding the credentials:
kubectl config view
The output should look like this:
apiVersion: v1 clusters: - cluster: certificate-authority-data: DATA+OMITTED
Show cluster information:
kubectl cluster-info
Get pods:
kubectl get pods -A
Observability stack is enabled by default. It includes the following components:
- Grafana
- Prometheus
- Loki
To disable it, set the enable_grafana
variable to false
in the terraform.tfvars
file.
To access Grafana:
-
Port-forward to the Grafana service: Run the following command to port-forward to the Grafana service:
kubectl --namespace o11y port-forward service/grafana 8080:80
-
Access the Grafana dashboard: Open your browser and go to
http://localhost:8080
. -
Log in: Use the default credentials to log in:
- Username:
admin
- Password:
admin
- Username:
-
Create a SA
nebius iam service-account create --parent-id <parent-id> --name <name>
. -
Add an SA to editors group.
Get your tenant id usingnebius iam whoami
.
Get theeditors
group id usingnebius iam group list --parent-id <tenant-id> | grep -n5 "name: editors"
. \List all members of the
editors
group withnebius iam group-membership list-members --parent-id <group-id>
.
Add your SA to theeditors
group withnebius iam group-membership create --parent-id <group-id> --member-id <sa-id>
\ -
Create access key and get its credentials:
nebius iam access-key create --account-service-account-id <SA-ID> --description 'AWS CLI' --format json
nebius iam access-key get-by-aws-id --aws-access-key-id <AWS-KEY-ID-FROM-PREVIOUS-COMMAND> --view secret --format json
\ -
Update
loki_access_key_id
andloki_secret_key
interraform.tfvars
with the result of the previous command.
Log aggregation with Loki is enabled by default. If you want to disable it, set the enable_loki
variable to false
in the
terraform.tfvars
file.
To access logs, go to the Loki dashboard http://localhost:8080/d/o6-BGgnnk/loki-kubernetes-logs
.
NB! You will have to manually clean the Loki bucket before performing the terraform destroy
command.
Prometheus server is enabled by default. If you want to disable it, set the enable_prometheus
variable to false
in the terraform.tfvars
file.
Because DCGM exporter
uses Prometheus as a data source it will also be disabled.
To access logs, go to the Node exporter folder http://localhost:8080/f/e6acfbcb-6f13-4a58-8e02-f780811a2404/
NVIDIA DCGM Exporter Dashboard and Alerting rules are enabled by default. If you need to disable it, set the enable_dcgm
variable to false
in terraform.tfvars` file.
Alerting rules are created for node groups with GPUs by default.
To access the NVIDIA DCGM Exporter dashboard, go to http://localhost:8080/d/Oxed_c6Wz/nvidia-dcgm-exporter-dashboard
To enable alert messages for Slack, refer to this article
- To use csi-driver, you must set 'enable_filestore = true' in the
terraform.tfvars
file. - Deploy the helm release that manages this csi-driver in the
helm.tf
file by applying the "csi-mounted-fs-path" module. - Keep in mind that the 'csi-mounted-fs-path' module can only be applied while instances are booting, using the following /nebius-solution-library/modules/cloud-init/k8s-cloud-init.tftpl commands:
- sudo mkdir -p /mnt/data - sudo mount -t virtiofs data /mnt/data - echo data /mnt/data \"virtiofs\" \"defaults\" \"0\" \"2\" | sudo tee -a /etc/fstab"
To use mounted storage, you need to manually create Persistent Volumes (PVs). Use the template below to create a PV and PVC.
Replace <SIZE>
and <HOST-PATH>
variables with your specific values.
kind: PersistentVolume
apiVersion: v1
metadata:
name: external-storage-persistent-volume
spec:
storageClassName: csi-mounted-fs-path-sc
capacity:
storage: "<SIZE>"
accessModes:
- ReadWriteMany
hostPath:
path: "<HOST-PATH>" # "/mnt/data/<sub-directory>" or "/mnt/glusterfs/<sub-directory>"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: external-storage-persistent-volumeclaim
spec:
storageClassName: csi-mounted-fs-path-sc
accessModes:
- ReadWriteMany
resources:
requests:
storage: "<SIZE>"
- FS should be mounted to all NodeGroups, because PV attachmend to pod runniing on Node without FS will fail
- One PV may fill up to all common FS size
- FS size will not be autoupdated if PV size exceed it spec size
- FS size for now can't be updated through API, only through NEBOPS. (thread)
- volumeMode: Block - is not possible
- read-write many mode PV will work
- MSP started testing that solution to enable early integration with mk8s. =======