Skip to content

Commit

Permalink
Merge branch 'main' into proposal/675-node-to-node-encryption
Browse files Browse the repository at this point in the history
  • Loading branch information
matofeder authored Oct 22, 2024
2 parents e6df9cb + 52fd69c commit b076bc2
Show file tree
Hide file tree
Showing 15 changed files with 356 additions and 154 deletions.
28 changes: 28 additions & 0 deletions .github/workflows/lint-golang.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Check Go syntax

on:
push:
paths:
- 'Tests/kaas/kaas-sonobuoy-tests/**/*.go'
- .github/workflows/lint-go.yml

jobs:
lint-go-syntax:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: '1.23'

# Install golangci-lint
- name: Install golangci-lint
run: |
curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v1.61.0
# Run golangci-lint
- name: Run golangci-lint
working-directory: Tests/kaas/kaas-sonobuoy-tests
run: golangci-lint run ./... -v
5 changes: 3 additions & 2 deletions .markdownlint-cli2.jsonc
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,10 @@
{
"name": "double-spaces",
"message": "Avoid double spaces",
"searchPattern": "/([^\\s>]) ([^\\s|])/g",
"searchPattern": "/([^\\s>|]) ([^\\s|])/g",
"replace": "$1 $2",
"skipCode": true
"skipCode": true,
"tables": false
}
]
}
Expand Down
15 changes: 1 addition & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,10 @@
<!-- markdownlint-disable -->
# Sovereign Cloud Stack – Standards and Certification

SCS unifies the best of cloud computing in a certified standard. With a decentralized and federated cloud stack, SCS puts users in control of their data and fosters trust in clouds, backed by a global open-source community.

## SCS compatible clouds

This is a list of clouds that we test on a nightly basis against our `scs-compatible` certification level.

| Name | Description | Operator | _SCS-compatible IaaS_ Compliance | HealthMon |
| -------------------------------------------------------------------------------------------------------------- | ------------------------------------------------- | ----------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------: |
| [gx-scs](https://github.com/SovereignCloudStack/docs/blob/main/community/cloud-resources/plusserver-gx-scs.md) | Dev environment provided for SCS & GAIA-X context | plusserver GmbH | [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-gx-scs-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-gx-scs-v4.yml) | [HM](https://health.gx-scs.sovereignit.cloud:3000/) |
| [pluscloud open](https://www.plusserver.com/en/products/pluscloud-open)<br />- prod1<br />- prod2<br />- prod3<br />- prod4 | Public cloud for customers (4 regions) | plusserver GmbH | &nbsp;<br />- prod1 [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-pco-prod1-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-pco-prod1-v4.yml)<br />- prod2 [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-pco-prod2-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-pco-prod2-v4.yml)<br />- prod3 [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-pco-prod3-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-pco-prod3-v4.yml)<br />- prod4 [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-pco-prod4-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-pco-prod4-v4.yml) | &nbsp;<br />[HM1](https://health.prod1.plusserver.sovereignit.cloud:3000/d/9ltTEmlnk/openstack-health-monitor2?orgId=1&var-mycloud=plus-pco)<br />[HM2](https://health.prod1.plusserver.sovereignit.cloud:3000/d/9ltTEmlnk/openstack-health-monitor2?orgId=1&var-mycloud=plus-prod2)<br />[HM3](https://health.prod1.plusserver.sovereignit.cloud:3000/d/9ltTEmlnk/openstack-health-monitor2?orgId=1&var-mycloud=plus-prod3)<br />[HM4](https://health.prod1.plusserver.sovereignit.cloud:3000/d/9ltTEmlnk/openstack-health-monitor2?orgId=1&var-mycloud=plus-prod4) |
| [Wavestack](https://www.noris.de/wavestack-cloud/) | Public cloud for customers | noris network AG/Wavecon GmbH | [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-wavestack-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-wavestack-v4.yml) | [HM](https://health.wavestack1.sovereignit.cloud:3000/) |
| [REGIO.cloud](https://regio.digital) | Public cloud for customers | OSISM GmbH | [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-regio-a-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-regio-a-v4.yml) | broken <!--[HM](https://apimon.services.regio.digital/public-dashboards/17cf094a47404398a5b8e35a4a3968d4?orgId=1&refresh=5m)--> |
| [CNDS](https://cnds.io/) | Public cloud for customers | artcodix GmbH | [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-artcodix-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-artcodix-v4.yml) | [HM](https://ohm.muc.cloud.cnds.io/) |
| [aov.cloud](https://www.aov.de/) | Community cloud for customers | aov IT.Services GmbH | (soon) | [HM](https://health.aov.cloud/) |
| PoC WG-Cloud OSBA | Cloud PoC for FITKO (yaook-based) | Cloud&amp;Heat Technologies GmbH | [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-poc-wgcloud-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-poc-wgcloud-v4.yml) | [HM](https://health.poc-wgcloud.osba.sovereignit.cloud:3000/d/9ltTEmlnk/openstack-health-monitor2?var-mycloud=poc-wgcloud&orgId=1) |
| PoC KDO | Cloud PoC for FITKO | KDO Service GmbH / OSISM GmbH | [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-poc-kdo-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-poc-kdo-v4.yml) | (soon) |
| [syseleven](https://www.syseleven.de/en/products-services/openstack-cloud/)<br />- dus2<br />- ham1 | Public OpenStack Cloud (2 SCS regions) | SysEleven GmbH | &nbsp;<br />- dus2 [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-syseleven-dus2-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-syseleven-dus2-v4.yml)<br />- ham1 [![Compliance Status](https://img.shields.io/github/actions/workflow/status/SovereignCloudStack/standards/check-syseleven-ham1-v4.yml?label=v4)](https://github.com/SovereignCloudStack/standards/actions/workflows/check-syseleven-ham1-v4.yml) | &nbsp;<br />(soon)<br />(soon) |
See [Compliant clouds overview](https://docs.scs.community/standards/certification/overview) on our docs page.

## SCS standards overview

Expand Down
52 changes: 35 additions & 17 deletions Standards/scs-0100-v3-flavor-naming.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ description: |

## Introduction

This is the standard v3.1 for SCS Release 5.
This is the standard v3.2 for SCS Release 8.
Note that we intend to only extend it (so it's always backwards compatible),
but try to avoid changing in incompatible ways.
(See at the end for the v1 to v2 transition where we have not met that
Expand Down Expand Up @@ -417,15 +417,17 @@ is more significant.

### [OPTIONAL] GPU support

Format: `_`\[`G/g`\]X\[N\]\[`-`M\]\[`h`\]
Format: `_`\[`G/g`\]X\[N\[`-`M\[`h`\]\[`-`V\[`h`\]\]\]\]

This extension provides more details on the specific GPU:

- pass-through (`G`) vs. virtual GPU (`g`)
- vendor (X)
- generation (N)
- number (M) of processing units that are exposed (for pass-through) or assigned; see table below for vendor-specific terminology
- high-performance indicator (`h`)
- high-frequency indicator (`h`) for compute units
- amount of video memory (V) in GiB
- an indicator for high-bandwidth memory

Note that the vendor letter X is mandatory, generation and processing units are optional.

Expand All @@ -440,13 +442,29 @@ for AMD GCN-x=0.x, RDNA1=1, C/RDNA2=2, C/RDNA3=3, C/RDNA3.5=3.5, C/RDNA4=4, ...
for Intel Gen9=0.9, Xe(12.1/DG1)=1, Xe(12.2)=2, Arc(12.7/DG2)=3 ...
(Note: This may need further work to properly reflect what's out there.)

The optional `h` suffix to the compute unit count indicates high-performance (e.g. high freq or special
high bandwidth gfx memory such as HBM);
`h` can be duplicated for even higher performance.
The optional `h` suffix to the compute unit count indicates high-frequency GPU compute units.
It is not normally recommended to use it except if there are several variants of cards within
a generation of GPUs and with similar number of SMs/CUs/EUs.
In case there are even more than two variants, the letter `h` can be duplicated for even
higher frquencies.

Example: `SCS-16V-64-500s_GNa-14h`
This flavor has a pass-through GPU nVidia Ampere with 14 SMs and either high-bandwidth memory or specially high frequencies.
Looking through GPU specs you could guess it's 1/4 of an A30.
Please note that there are GPUs from one generation and vendor that have vastly different sizes
(or different fractions are being passed to an instance with multi-instance-GPUs). The number
M allows to differentiate between them and have an indicator of the compute capability and
parallelism. M can not at all be compared between different generations let alone different
vendors.

The amount of video memory dedicated to the instance can be indicated by V (in binary
Gigabytes). This number needs to be an integer - fractional memory sizes must be rounded
down. An optional `h` can be used to indicate high bandwidth memory (such as HBM2+) with
bandwidths well above 1GiB/s.

Example: `SCS-16V-64-500s_GNa-14-6h`
This flavor has a pass-through GPU nVidia Ampere with 14 SMs and 6 GiB of high-bandwidth video
memory. Looking through GPU specs you could guess it's 1/4 of an A30.

We have a table with common GPUs in the
[implementation hints for this standard](scs-0100-w1-flavor-naming-implementation-testing.md)

### [OPTIONAL] Infiniband

Expand Down Expand Up @@ -490,14 +508,14 @@ an image is considered broken by the SCS team.

## Proposal Examples

| Example | Decoding |
| ------------------------- | ---------------------------------------------------------------------------------------------- |
| SCS-2C-4-10n | 2 dedicated cores (x86-64), 4GiB RAM, 10GB network disk |
| SCS-8Ti-32-50p_i1 | 8 dedicated hyperthreads (insecure), Skylake, 32GiB RAM, 50GB local NVMe |
| SCS-1L-1u-5 | 1 vCPU (heavily oversubscribed), 1GiB Ram (no ECC), 5GB disk (unspecific) |
| SCS-16T-64-200s_GNa-64_ib | 16 dedicated threads, 64GiB RAM, 200GB local SSD, Infiniband, 64 Passthrough nVidia Ampere SMs |
| SCS-4C-16-2x200p_a1 | 4 dedicated Arm64 cores (A76 class), 16GiB RAM, 2x200GB local NVMe drives |
| SCS-1V-0.5 | 1 vCPU, 0.5GiB RAM, no disk (boot from cinder volume) |
| Example | Decoding |
| ------------------------------ | ---------------------------------------------------------------------------------------------- |
| `SCS-2C-4-10n` | 2 dedicated cores (x86-64), 4GiB RAM, 10GB network disk |
| `SCS-8Ti-32-50p_i1` | 8 dedicated hyperthreads (insecure), Skylake, 32GiB RAM, 50GB local NVMe |
| `SCS-1L-1u-5` | 1 vCPU (heavily oversubscribed), 1GiB Ram (no ECC), 5GB disk (unspecific) |
| `SCS-16T-64-200s_GNa-72-24_ib` | 16 dedicated threads, 64GiB RAM, 200GB local SSD, Infiniband, 72 Passthrough nVidia Ampere SMs |
| `SCS-4C-16-2x200p_a1` | 4 dedicated Arm64 cores (A76 class), 16GiB RAM, 2x200GB local NVMe drives |
| `SCS-1V-0.5` | 1 vCPU, 0.5GiB RAM, no disk (boot from cinder volume) |

## Previous standard versions

Expand Down
104 changes: 103 additions & 1 deletion Standards/scs-0100-w1-flavor-naming-implementation-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ See the [README](https://github.com/SovereignCloudStack/standards/tree/main/Test
for more details.

The functionality of this script is also (partially) exposed via the web page
[https://flavors.scs.community/](https://flavors.scs.community/).
[https://flavors.scs.community/](https://flavors.scs.community/), which can both
parse SCS flavors names as well as generate them.

With the OpenStack tooling (`python3-openstackclient`, `OS_CLOUD`) in place, you can call
`cli.py -v parse v3 $(openstack flavor list -f value -c Name)` to get a report
Expand All @@ -45,6 +46,107 @@ will create a whole set of flavors in one go.
To that end, it provides different options: either the standard mandatory and
possibly recommended flavors can be created, or the user can set a file containing his flavors.

### GPU table

The most commonly used datacenter GPUs are listed here, showing what GPUs (or partitions
of a GPU) result in what GPU part of the flavor name.

#### Nvidia (`N`)

We show the most popular recent generations here. Older one are of course possible as well.

##### Ampere (`a`)

One Streaming Multiprocessor on Ampere has 64 (A30, A100) or 128 Cuda Cores (A10, A40).

GPUs without MIG (one SM has 128 Cuda Cores and 4 Tensor Cores):

| Nvidia GPU | Tensor C | Cuda Cores | SMs | VRAM | SCS name piece |
|------------|----------|------------|-----|-----------|----------------|
| A10 | 288 | 9216 | 72 | 24G GDDR6 | `GNa-72-24` |
| A40 | 336 | 10752 | 84 | 48G GDDR6 | `GNa-84-48` |

GPUs with Multi-Instance-GPU (MIG), where GPUs can be partitioned and the partitions handed
out as as pass-through PCIe devices to instances. One SM corresponds to 64 Cuda Cores and
4 Tensor Cores.

| Nvidia GPU | Fraction | Tensor C | Cuda Cores | SMs | VRAM | SCS GPU name |
|------------|----------|----------|------------|-----|-----------|----------------|
| A30 | 1/1 | 224 | 3584 | 56 | 24G HBM2 | `GNa-56-24` |
| A30 | 1/2 | 112 | 1792 | 28 | 12G HBM2 | `GNa-28-12` |
| A30 | 1/4 | 56 | 896 | 14 | 6G HBM2 | `GNa-14-6` |
| A30X | 1/1 | 224 | 3584 | 56 | 24G HBM2e | `GNa-56h-24h` |
| A100 | 1/1 | 432 | 6912 | 108 | 80G HBM2e | `GNa-108h-80h` |
| A100 | 1/2 | 216 | 3456 | 54 | 40G HBM2e | `GNa-54h-40h` |
| A100 | 1/4 | 108 | 1728 | 27 | 20G HBM2e | `GNa-27h-20h` |
| A100 | 1/7 | 60+ | 960+ | 15+| 10G HBM2e | `GNa-15h-10h`+ |
| A100X | 1/1 | 432 | 6912 | 108 | 80G HBM2e | `GNa-108-80h` |

[+] The precise numbers for the 1/7 MIG configurations are not known by the author of
this document and need validation.

##### Ada Lovelave (`l`)

No MIG support, 128 Cuda Cores and 4 Tensor Cores per SM.

| Nvidia GPU | Tensor C | Cuda Cores | SMs | VRAM | SCS name piece |
|------------|----------|------------|-----|-----------|----------------|
| L4 | 232 | 7424 | 58 | 24G GDDR6 | `GNl-58-24` |
| L40 | 568 | 18176 | 142 | 48G GDDR6 | `GNl-142-48` |
| L40G | 568 | 18176 | 142 | 48G GDDR6 | `GNl-142h-48` |
| L40S | 568 | 18176 | 142 | 48G GDDR6 | `GNl-142hh-48` |

##### Grace Hopper (`g`)

These have MIG support and 128 Cuda Cores and 4 Tensor Cores per SM.

| Nvidia GPU | Fraction | Tensor C | Cuda Cores | SMs | VRAM | SCS GPU name |
|------------|----------|----------|------------|-----|------------|----------------|
| H100 | 1/1 | 528 | 16896 | 132 | 80G HBM3 | `GNg-132-80h` |
| H100 | 1/2 | 264 | 8448 | 66 | 40G HBM3 | `GNg-66-40h` |
| H100 | 1/4 | 132 | 4224 | 33 | 20G HBM3 | `GNg-33-20h` |
| H100 | 1/7 | 72+ | 2304+ | 18+| 10G HBM3 | `GNg-18-10h`+ |
| H200 | 1/1 | 528 | 16896 | 132 | 141G HBM3e | `GNg-132-141h` |
| H200 | 1/2 | 264 | 16896 | 66 | 70G HBM3e | `GNg-66-70h` |
| ... |

[+] The precise numbers for the 1/7 MIG configurations are not known by the author of
this document and need validation.

#### AMD Radeon (`A`)

##### CDNA 2 (`2`)

One CU contains 64 Stream Processors.

| AMD Instinct| Stream Proc | CUs | VRAM | SCS name piece |
|-------------|-------------|-----|------------|----------------|
| Inst MI210 | 6656 | 104 | 64G HBM2e | `GA2-104-64h` |
| Inst MI250 | 13312 | 208 | 128G HBM2e | `GA2-208-128h` |
| Inst MI250X | 14080 | 229 | 128G HBM2e | `GA2-220-128h` |

##### CDNA 3 (`3`)

SRIOV partitioning is possible, resulting in pass-through for
up to 8 partitions, somewhat similar to Nvidia MIG. 4 Tensor
Cores and 64 Stream Processors per CU.

| AMD GPU | Tensor C | Stream Proc | CUs | VRAM | SCS name piece |
|-------------|----------|-------------|-----|------------|----------------|
| Inst MI300X | 1216 | 19456 | 304 | 192G HBM3 | `GA3-304-192h` |
| Inst MI325X | 1216 | 19456 | 304 | 288G HBM3 | `GA3-304-288h` |

#### intel Xe (`I`)

##### Xe-HPC (Ponte Vecchio) (`3`)

1 EU corresponds to one Tensor Core and contains 128 Shading Units.

| intel DC GPU | Tensor C | Shading U | EUs | VRAM | SCS name part |
|--------------|----------|-----------|-----|------------|----------------|
| Max 1100 | 56 | 7168 | 56 | 48G HBM2e | `GI3-56-48h` |
| Max 1550 | 128 | 16384 | 128 | 128G HBM2e | `GI3-128-128h` |

## Automated tests

### Errors
Expand Down
Loading

0 comments on commit b076bc2

Please sign in to comment.