Traces for Kubernetes Scheduler Simulator
This repo contains trace data for the Kubernetes Scheduler Simulator, which evaluates different scheduling policies in GPU-sharing clusters. It includes the Fragmentation Gradient Descent (FGD) policy proposed in the USENIX ATC 2023 paper "Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent", along with other baseline policies (e.g., Best-fit, Dot-product, GPU Packing, GPU Clustering, Random-fit).
Key data is in the csv folder, while yaml files can be generated from this data.
This file contains 1523 nodes of a heterogeneous GPU cluster in production, listing their CPU, main memory, GPU specifications and GPU types.
openb_node_list_gpu_node.csv is a subset excluding non-GPU nodes. openb_node_list_gpu_node.yaml contains the same data in YAML format.
Here's a sample output:
sn | cpu_milli | memory_mib | gpu | model | |
---|---|---|---|---|---|
0 | openb-node-0227 | 32000 | 262144 | 0 | nan |
1 | openb-node-0228 | 128000 | 786432 | 8 | G3 |
2 | openb-node-0229 | 96000 | 786432 | 8 | V100M32 |
cpu_milli
: Number of CPUs (in milli)memory_mib
: Main memory (in MiB)gpu
: Number of GPUsmodel
: GPU type. G1, G2, G3 are undisclosed internal GPU codes.
This file contains 8152 tasks submitted to the GPU cluster, listing their resource specifications, QoS, phase and creation/deletion/scheduled times.
The other openb_pod_list_*.csv files (excluding the gpuspec ones) are sampled from the default one, emphasizing certain types of workloads (e.g., CPU-only tasks, GPU-sharing tasks, multi-GPU tasks).
Trace files with gpuspec
augment tasks with GPU type requirements. About 33% of GPU tasks in our production cluster have GPU type constraints (see openb_pod_list_gpuspec33.csv and Sec. 6.5 in the "Beware of Fragmentation" paper).
Here's a sample output:
name | cpu_milli | memory_mib | num_gpu | gpu_milli | gpu_spec | qos | pod_phase | creation_time | deletion_time | scheduled_time | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | openb-pod-0017 | 88000 | 327680 | 8 | 1000 | nan | Burstable | Succeeded | 9437497 | 10769854 | 9437497 |
1 | openb-pod-0022 | 4000 | 15258 | 1 | 220 | nan | BE | Running | 9679175 | 9973826 | 9679175 |
2 | openb-pod-0035 | 16000 | 32768 | 1 | 1000 | V100M16|V100M32 | LS | Running | 9967058 | 9968575 | 9967063 |
cpu_milli
: Number of CPUs requested (in milli)memory_mib
: Main memory requested (in MiB)num_gpu
: Number of GPUs requested (integers from 0 to 8)gpu_milli
: Detailed GPU requested for GPU-sharing workloads (i.e.,num_gpu==1
) (in milli).gpu_spec
: Required GPU types, For example,nan
means no GPU type constraints whileV100M16|V100M32
means the task can run on NVIDIA V100 with either 16GB VRAM or 32GB VRAM.qos
: Quality of Service (e.g., Burstable, Best Effort (BE), Latency Sensitive (LS))pod_phrase
: Succeeded, Running, Pending, Failedcreation_time
: Timestamp of creation (in seconds)deletion_time
: Timestamp of deletion (in seconds)scheduled_time
: Timestamp of being scheduled (in seconds)
Generate the YAML files needed for the simulation experiment based on the original CSV files.
$ bash prepare_input.sh
For cluster scheduling simulation on Kubernetes, refer to the Kubernetes Scheduler Simulator repo: https://github.com/hkust-adsl/kubernetes-scheduler-simulator.
Please cite our paper if it is helpful to your research.
@inproceedings{FGD2023,
title = {Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent},
author = {Qizhen Weng and Lingyun Yang and Yinghao Yu and Wei Wang and Xiaochuan Tang and Guodong Yang and Liping Zhang},
booktitle = {2023 {USENIX} Annual Technical Conference},
year = {2023},
series = {{USENIX} {ATC} '23},
url = {https://www.usenix.org/conference/atc23/presentation/weng},
publisher = {{USENIX} Association}
}