Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/add certification process2 #781

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
22 changes: 11 additions & 11 deletions Standards/scs-0004-v1-achieving-certification.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,33 +17,33 @@ As operator, I want to obtain a certificate with the scope SCS-compatible IaaS o

## Regulations

1. Each certificate issued pertains to a given cloud, a given scope, and a given version of that scope with a fixed expiry date. The certificate is only valid for that cloud and for the time frame that ends on that expiry date.
0. Certificates are issued by the SCS certification assessment body (initially the SCS project in the OSB Alliance e.V., to be succeeded by the Forum SCS-Standards in the very same OSB Alliance). An interested party has to apply for certification with this body, which in turn determines the rules that govern what parties are eligible for application (fees may apply).

2. The operator MUST include the official [SCS compliance test suite](https://github.com/SovereignCloudStack/standards/tree/main/Tests) (which does not require admin privileges) in their continuous test infrastructure (e.g., Zuul). The tests MUST be run at given intervals, depending on their resource-usage classification:
1. Each certificate issued pertains to a given combination of subject (i.e., cloud environment), scope (such as _SCS-compatible IaaS_, and version of that scope. The certificate is only valid for that combination and for the time frame that ends when the scope expires, or for six months if the expiration date for the scope is not yet fixed.

- _light_: at least nightly,
- _medium_: at least weekly,
- _heavy_: at least monthly.
2. The operator MUST ensure that the official [SCS compliance test suite](https://github.com/SovereignCloudStack/standards/tree/main/Tests) (which does not require admin privileges) is run at regular intervals and the resulting reports transmitted to the [SCS compliance monitor](https://github.com/SovereignCloudStack/standards/tree/main/compliance-monitor).

For public clouds, it is recommended to offer the SCS project access to the infrastructure so the test suite runs can be triggered continuously by the SCS team.
For public clouds, the SCS certification assessment body can take on this task provided that suitable access to test subject is supplied.

Alternatively, and for non-public clouds, the results (log files) MUST be submitted to SCS (by a mechanism of SCS' choice) at least weekly, and they need to be reproduced again on request by SCS.
The test suite is partitioned according to resource usage; the required test intervals depend on this classification:

<!-- Initially this will probably be eMail -->
- _light_: at least nightly,
- _medium_: at least weekly,
- _heavy_: at least monthly.

3. If the desired certificate requires manual checks, then the operator MUST offer the SCS project suitable access. Manual checks MUST be repeated once every quarter.
3. If the desired certificate requires manual checks, then the operator MUST offer the SCS project suitable documentation. Manual checks MUST be repeated once every quarter. In addition, the SCS certification assessment body reserves the right to occasionally verify documentation on premises.

4. Details on the standards achieved, as well as the current state and the history of all test and check results of the past 18 months will be displayed on a public webpage (henceforth, _certificate status page_) owned by SCS.

The page will be kept online for the duration of the certificate's validity, plus at least 3 months; afterwards, it can be taken offline, either upon request or in the course of maintenance cleanup. However, the page's content won't be deleted until 12 months after the certificate's expiration, for the page will be reanimated and reused if, within this timeframe, a new certificate is issued for the same scope and the same cloud.

5. The SCS certification assessment body (initially the SCS project in the OSB Alliance e.V., possibly further entities empowered to do so by the SCS trademark owner, currently the OSB Alliance e.V.) WILL review the certification application and either grant the certification, reject it or ask for further measures or information.
5. The SCS certification assessment body WILL review the certification application and either grant the certification, reject it or ask for further measures or information.

6. Once the certificate is granted by the SCS certification assessment body, the operator SHOULD use the corresponding logo and publicly state the certified "SCS compatibility" on the respective layer for the time of the validity of the certification. In case of a public cloud, this public display is even REQUIRED. In any case, the logo MUST be accompanied by a hyperlink (a QR code for printed assets) to the respective certificate status page.

7. If the certificate is to be revoked for any reason, it will be included in a publicly available Certificate Revocation List (CRL). This fact will also be reflected in the certificate status page.

8. If any of the automated tests or manual checks fail after the certificate has been issued, the certificate is not immediately revoked. Rather, the automated tests MUST pass 99.x % of the runs, and the operator SHALL be notified at the second failed attempt in a row at the latest. In case a manual check fails, it has to be repeated at a date to be negotiated with SCS. It MAY NOT fail more than two times in a row.
8. If any of the automated tests or manual checks fail after the certificate has been issued, the certificate is not immediately revoked. Rather, the automated tests MUST pass 99.x % of the runs, and the operator SHALL be notified at the second failed attempt in a row at the latest. In case a manual check fails, it has to be repeated at a date to be negotiated with the SCS certification assessment body. It MAY NOT fail more than two times in a row.

## Design Considerations

Expand Down
133 changes: 133 additions & 0 deletions Standards/scs-0004-w1-achieving-certification-implementation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
title: "Implementation hints for achieving SCS-compatible certification"
type: Supplement
track: Global
status: Draft
supplements:
- scs-0004-v1-achieving-certification.md
---

## Process overview

The *SCS-compatible* Certification for Operators is a technical certification:
The operator needs to fulfill technical requirements, such as providing certain
APIs and guaranteeing certain platform behavior in order to be certifiable.

These requirements are meant to provide guarantees to their customers, allowing
them to rely on certain features to be available and on certain system behavior
that lets their applications run in a reliable way.

The SCS certification process typically consists of a few simple steps:

1. Running the SCS compliance test suite and adjusting the infrastructure until it passes.
2. Any additional declarations (for non-testable aspects) are written and passed to the SCS certification body.
3. The operator must be a member ("shaper" or "advisor" level) of the Forum SCS-Standards in the
OSB Alliance (a non-profit) and pay the respective membership fees. Alternatively fees can
be paid without becoming a member.
Comment on lines +24 to +26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be included in the main document?

4. The cloud can be listed on the SCS pages as *SCS-compatible* with a compatibility status that is
updated on a daily basis. SCS then tests the infrastructure on a daily basis.

The precise rules that govern how certificates are issued or withdrawn are defined in the
[SCS standard 0004](scs-0004-v1-achieving-certification.md).

## Self-testing and technical adjustments

In order for a cloud service offering to obtain a certificate, it has to
conform to all standards of the respective scope, which will be tested at
regular intervals, and the results of these tests will be made available
publicly.

The best approach to get your cloud into compliance is by installing the
test suite locally. Have a look at the
[blog article](https://scs.community/2024/10/14/cert-adapt-example/).

A description of how *SCS-compatible IaaS* compliance can be achieved on OpenStack environments that
do not use the SCS reference implementation is written up in the blog article
[Cost of making an OpenStack Cluster SCS compliant](https://scs.community/2024/05/13/cost-of-making-an-openstack-cluster-scs-compliant/).

## Declarations

For the SCS-compatible IaaS v5 standard, the providers must — if they implement availability zones
at all (which is optional) — guarantee certain levels of independence for these. This can not
be fully tested by an automated test. The process thus envisions that providers must create some
documentation on the physical infrastructure and how it maps to availability zones and declare that
this documentation reflects the truth. SCS will review the docs and judge whether they meet the
criteria. In case of doubt, audits can be performed.
Comment on lines +50 to +55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be dropped, because it is redundant and it ages too quickly. Suggestion:

  • make the respective test output a note that documentation is required (mind you that it will show up as MISSING anyway, because the automated tests won't produce anything for the test cases in question)

On second thought, this is probably what is going to happen anyway. This paragraph is just a bit ahead of its time. Still, it should be replaced by a general remark that some test cases will turn up missing because documentation must be handed in.


## Forum SCS-Standards @ OSBA

The SCS brand belongs to the Open Source Business Alliance e.V. (OSBA), an non-profit organization and
association for the Open Source Industry in Germany. After the completion of the funded SCS project
in the OSBA on 2024-12-31, the OSBA sets up the Forum SCS-Standards
which performs the work to evolve the SCS standards, develops the tests and perform the certification
process and thus becomes the SCS certification body.

Members of the OSBA can become also member of the Forum SCS-Standards for an additional membership
fee, providing the financial resources for the Forum SCS-Standards to do its work. Membership in the
OSBA is open to any organization that supports the goals of the OSBA.
Alternatively, a certification fee can be paid without any membership.
Comment on lines +57 to +68
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also ahead of its time. It sneakily uses present tense for something that is in the future:

After the completion of the funded SCS project in the OSBA on 2024-12-31, the OSBA sets up the Forum SCS-Standards

Maybe this should be a blog post before we add it to the official docs.


## Getting listed and tested

When all tests are passing, all needed declarations are done, fees for the certification or the
membership in the Forum SCS-Standards at the OSBA have been paid, the infrastructure service
can become officially certified.

The SCS team will add the cloud to the [list of certified clouds](https://docs.scs.community/standards/certification/overview)
on the SCS docs page. This can be used to prove to customers that the cloud is SCS compliant.
Note that for public clouds, there will be a nightly job that tests the cloud for compliance, which will be
triggered by SCS infrastructure (zuul). For this, access to a tenant on the cloud needs
to be provided free of charge. (This only requires very low quota, one VM is created for a minute
in one of the tests.)
Comment on lines +78 to +81
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This language is too strong. We offer to orchestrate the tests for the partners as a service, but (from my POV) they should be free to choose whether they want to run the tests themselves. It's a rather simple process that we are testing and refining right now with AOV.


For clouds not being accessible from the outside, a VPN tunnel or a local monitoring
job (with result upload) can be used.

Please let us know if you want us to create an official SCS-certified badge that
can be used in your marketing material beyond pointing to our list.

### Optional Health Monitor

Note that for almost all certified clouds in the list of certified clouds, we also
have a health monitor running (currently still
[openstack-health-monitor](https://docs.scs.community/docs/operating-scs/guides/openstack-health-monitor/Debian12-Install)
but soon the new [health-monitor](https://scs.community/tech/2024/09/06/vp12-scs-health-monitor-tech-preview/)),
which exposes information on the performance and error rate of each cloud.
This provides some transparency on the state of the clouds by constantly running
scenario tests against them and is tremendously helpful for both the cloud operations
teams and their customers. Strictly speaking, it is *not* a requirement for the
*SCS-compatible* certification, just best practice. It will be part of an
*SCS-sovereign* certification though, where transparency on operational aspects
will be required.

## Staying compliant

Once your cloud is listed in the
[list of certified clouds](https://docs.scs.community/standards/certification/overview)
which is fed by the
[compliance manager](https://compliance.sovereignit.cloud/page/table), it
will enjoy the nightly tests. These might fail for a number of reasons:

* There is a new version of the SCS standards in effect and you need to adjust things.
* Your cloud was unreachable or otherwise had intermittent issues.
* You have done changes to your cloud that break *SCS-compatible* compliance.
* The test automation engine (zuul) is in trouble.
* The tests have a bug.

In either case, this need proper analysis to determine what should be done.
<!--In the list of certified clouds, the tests are performed by github actions.
These are executed from the
[github SCS standards repository](https://github.com/SovereignCloudStack/standards).
By looking at the logs from the github actions, you can typically see why the failure
happened. You could of course also do a local test again to see if the issue can
be reproduced.-->
In the compliance manager (executing tests via zuul), we will add links to the log
files directly on the table, so it will be even easier to find the relevant log files.
Comment on lines +124 to +125
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a complication here. Special privileges are required to

  • see unfiltered test results (we currently hide fails for a period of 7 days if they aren't explicitly approved by us)
  • see log files

We have to hand out "API keys" to our partners so they can do these things.

It is a good idea to reproduce the failures by running the test suite locally,
as it may be easier to focus on just the one failing aspect of your infrastructure.

Your cloud will show up as failing in the compliance manager after tests start
failing; this is not the same as a revoked certification, though. For clouds that have been
compliant before, it is highly recommended to work with the SCS certification body
upon such failures to determine a way back into compliance that avoids certification
revocation.