Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Magnum Cluster API docs #972

Merged
merged 10 commits into from
Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
sd109 marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
124 changes: 103 additions & 21 deletions doc/source/configuration/magnum-capi.rst
Original file line number Diff line number Diff line change
@@ -1,45 +1,127 @@
=========================
Magnum Cluster API Driver
=========================
A new driver for magnum has been written. It is an alternative to heat (as heat gets phased out due to maintenance burden) that allows the definition of clusters as Kubernetes CRDs as opposed to heat templates. The two are compatible and can both be active on the same deployment, and the decision of which driver is used for a given template depends on certain parameters inferred from the template. For the new driver, these are `{'server_type' : 'vm', 'os' : 'ubuntu', 'coe': kubernetes'}`.
Drivers can be enabled and disabled via the `disabled_drivers` parameter of `[drivers]` under `magnum.conf`.

Prerequisites for deploying the CAPI driver in magnum:
A new driver for Magnum has been written which is an alternative to Heat (as Heat gets phased out due to maintenance burden) and instead uses the Kubernetes `Cluster API project <https://cluster-api.sigs.k8s.io>`_ to manage the OpenStack infrastructure required by Magnum clusters. The idea behind the Cluster API (CAPI) project is that infrastructure is managed using Kubernetes-style declarative APIs, which in practice means a set of Custom Resource Definitions (CRDs) and Kubernetes `operators <https://kubernetes.io/docs/concepts/extend-kubernetes/operator/>`_ to translate instances of those custom Kubernetes resources into the required OpenStack API resources. These same operators also handle resource reconciliation (i.e. when the Kubernetes custom resource is modified, the operator will make the required OpenStack API calls to reflect those changes).

Management Cluster
===================
The CAPI driver relies on a management Kubernetes cluster, installed inside the cloud, to manage tenant Kubernetes clusters.
The easiest way to get one is by deploying `this <https://github.com/stackhpc/azimuth-config/tree/feature/capi-mgmt-config>`__ branch of azimuth-config, and look at the `capi-mgmt-example` environment. Refer to the `azimuth-config wiki <https://stackhpc.github.io/azimuth-config/>`__ for detailed steps on how to deploy.
The new CAPI driver and the old Heat driver are compatible and can both be active on the same deployment, and the decision of which driver is used for a given template depends on certain parameters inferred from the Magnum cluster template. For the new driver, these parameters are ``{'server_type' : 'vm', 'os' : 'ubuntu', 'coe': kubernetes'}``. Drivers can be enabled and disabled using the ``disabled_drivers`` parameter in the ``[drivers]`` section of ``magnum.conf``.

Ensure that you have set `capi_cluster_apiserver_floating_ip: true`, as the management cluster will need an externally accessible IP. The external network this corresponds to is whatever you have set `azimuth_capi_operator_external_network_id` to. This network needs to be reachable from wherever the magnum container is running.
Deployment Prerequisites
========================

The Cluster API architecture relies on a CAPI management cluster in order to run the aforementioned Kubernetes operators which interact directly with the OpenStack APIs. The two requirement for this management cluster are:
sd109 marked this conversation as resolved.
Show resolved Hide resolved

1. It must be capable of reaching the public OpenStack APIs.

2. It must be reachable from the control plane nodes (either controllers or dedicated network hosts) on which the Magnum container is running (so that the Magnum can reach the IP listed in the management cluster's ``kubeconfig`` file).
sd109 marked this conversation as resolved.
Show resolved Hide resolved
sd109 marked this conversation as resolved.
Show resolved Hide resolved

For testing purposes, a simple `k3s <https://k3s.io>`_ cluster would suffice. For production deployments, the recommended solution is to instead set up a separate HA management cluster in an isolated OpenStack project by leveraging the CAPI management cluster configuration used in Azimuth. This approach will provide a resilient HA management cluster with a standard set of component versions which are regularly tested in Azimuth CI.
sd109 marked this conversation as resolved.
Show resolved Hide resolved
The general process for setting up this CAPI management cluster using Azimuth tooling is described here, but the `Azimuth operator documentation <https://stackhpc.github.io/azimuth-config/#deploying-azimuth>`_ should be consulted for additional information if required.

The diagram below shows the general architecture of the CAPI management cluster provisioned using Azimuth tooling. It consists of a Seed VM running a small k3s cluster (which itself is actually a CAPI management cluster but only for the purpose of managing the HA cluster) as well as a HA management cluster made up of (by default) 3 control plane VMs and 3 worker VMs. This HA cluster runs the various Kubernetes component responsible for managing Magnum tenant clusters.
sd109 marked this conversation as resolved.
Show resolved Hide resolved
sd109 marked this conversation as resolved.
Show resolved Hide resolved


sd109 marked this conversation as resolved.
Show resolved Hide resolved
.. image:: /_static/images/capi-architecture-diagram.png
:width: 100%

The setup and configuration of a CAPI management cluster using Azimuth tooling follows a pattern which should be familiar to Kayobe operators. There is an 'upstream' `azimuth-config <https://github.com/stackhpc/azimuth-config>`_ repository which contains recommended defaults for various configuration options (equivalent to stackhpc-kayobe-config), and then each client site will maintain an independent copy of this repository which will contain site-specific configuration. Together, these upstream and site-specific configuration repositories can set or override Ansible variables for the `azimuth-ops <https://github.com/stackhpc/ansible-collection-azimuth-ops>`_ Ansible collection, which contains the playbooks required to deploy or update a CAPI management cluster (or a full Azimuth deployment).
sd109 marked this conversation as resolved.
Show resolved Hide resolved

In order to deploy a CAPI management cluster for use with Magnum, first create a copy of the upstream Azimuth config repository in the client's GitHub/GitLab. To do so, follow the instructions found in the `initial repository setup <https://stackhpc.github.io/azimuth-config/repository/#initial-repository-setup>`_ section of the Azimuth operator docs. The site-specific repository should then be encrypted following `these instructions <https://stackhpc.github.io/azimuth-config/repository/secrets/>`_ to avoid leaking any secrets (such as cloud credentials) which will be added to the configuration later on.
sd109 marked this conversation as resolved.
Show resolved Hide resolved

Next, rather than copying the ``example`` environment as recommended in the Azimuth docs, instead copy the ``capi-mgmt-example`` environment and give it a suitable site-specific name:

.. code-block:: bash

cp -r ./environments/capi-mgmt-example ./environments/<site-specific-name>

By default, both the seed VM name and the CAPI cluster VM names will be derived by prefixing the environment name with `capi-mgmt-` so naming the environment after the cloud (e.g. `sms-lab-prod`) is recommended.

Having created this concrete environment to hold site-specific configuration, next open ``environments/<site-specific-name>/inventory/group-vars/all/variables.yml`` and, at a minimum, set the following options to the desired values for the target cloud:

.. code-block:: yaml

infra_external_network_id: <cloud-external-network-id>
infra_flavor_id: <seed-vm-flavor>
capi_cluster_control_plane_flavor: <ha-cluster-control-plane-vm-flavor>
capi_cluster_worker_flavor: <ha-cluster-worker-vm-flavor>

The comments surrounding each option in the ``variables.yml`` provide some tips on choosing sensible values (e.g. resource requirements for each flavor). In most cases, other configuration options can be left blank since they will fall back to the upstream defaults; however, if the default configuration is not suitable, the roles in `ansible-collection-azimuth-ops <https://github.com/stackhpc/ansible-collection-azimuth-ops>`_ contain a range of config variables which can be overridden in ``variables.yml`` as required. In particular, the `infra role variables <https://github.com/stackhpc/ansible-collection-azimuth-ops/blob/main/roles/infra/defaults/main.yml>`_ are mostly relevant to the seed VM configuration, and the `capi_cluster role variables <https://github.com/stackhpc/ansible-collection-azimuth-ops/blob/main/roles/capi_cluster/defaults/main.yml>`_ are relevant for HA cluster config.

*Note* - One important distinction between azimuth-config and stackhpc-kayobe-config is that the environments in azimuth-config are `layered`. This can be seen in the ``ansible.cfg`` file for each environment, which will contain a line such as ``inventory = ../base/inventory,../ha/inventory,../capi-mgmt/inventory,./inventory`` showing the inheritance chain for variables defined in each environment. See `these docs <https://stackhpc.github.io/azimuth-config/environments/>`_ for more details.
sd109 marked this conversation as resolved.
Show resolved Hide resolved

In addition to setting the required infrastructure variables, Terraform must also be configured to use a remote state store (either GitLab or S3) for the seed VM state. To do so, follow the instructions found `here <https://stackhpc.github.io/azimuth-config/repository/terraform/>`_.

The HA cluster also contains a deployment of `kube-prometheus-stack <https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack>`_ for monitoring and alerting. To send the cluster alerts to Slack, the ``alertmanager_config_slack_webhook_url`` variable should be set in ``environments/<site-specific-name>/inventory/group-vars/all/secrets.yml``. If the repository was encrypted correctly above, this file will automatically be encrypted before a git push. Run ``git-crypt status -e`` to verify that this file is included in the encrypted list before git-committing the webhook URL.

The final step before beginning deployment of the CAPI management cluster is to provide some cloud credentials. It is recommended that the CAPI management cluster is deployed in an isolated OpenStack project. After creating the target project (preferably using openstack-config), generate an application credential for the project using the Identity tab in Horizon and then download the corresponding ``clouds.yaml`` file and place it in ``environments/<site-specific-name>/clouds.yaml``.
sd109 marked this conversation as resolved.
Show resolved Hide resolved

To deploy the CAPI management cluster using this site-specific environment, run

.. code-block:: bash

# Activate the environment
./bin/activate <site-specific-name>

# Install or update the local Ansible Python venv
./bin/ensure-venv

# Install or update Ansible dependencies
ansible-galaxy install -f -r ./requirements.yml

# Run the provision playbook from the azimuth-ops collection
# NOTE: THIS COMMAND RUNS A DIFFERENT PLAYBOOK FROM
# THE STANDARD AZIMUTH DEPLOYMENT INSTRUCTIONS
ansible-playbook stackhpc.azimuth_ops.provision_capi_mgmt
sd109 marked this conversation as resolved.
Show resolved Hide resolved

The general running order of the provisioning playbook is the following:

- Ensure Terraform is installed locally

- Use Terraform to provision the seed VM (and create any required internal networks, volumes etc.)

- Install k3s on the seed (with all k3s data stored on the attached Cinder volume)

- Install the required components on the k3s cluster to provision the HA cluster

- Provision the HA cluster

- Install the required components on the HA cluster to manage Magnum user clusters

Once the seed VM has been provisioned, it can be accessed via SSH by running ``./bin/seed-ssh`` from the root of the azimuth-config repository. Within the seed VM, the k3s cluster and the HA cluster can both be accessed using the pre-installed ``kubectl`` and ``helm`` command line tools. Both of these tools will target the k3s cluster by default; however the ``kubeconfig`` file for the HA cluster can be found in the seed's home directory (named e.g. ``kubeconfig-capi-mgmt-<site-specific-name>.yaml``).

*Note* - The provision playbook is responsible for copying the HA ``kubeconfig`` to this location *after* the HA cluster is up and running. If you need to access the HA cluster while it is still deploying, the ``kubeconfig`` file can be found stored as a Kubernetes secret on the k3s cluster.
sd109 marked this conversation as resolved.
Show resolved Hide resolved

It is possible to reconfigure or upgrade the management cluster after initial deployment by simply re-running the ``provision_capi_mgmt`` playbook. However, it's preferable that most Day 2 ops (i.e. reconfigures and upgrades) be done via a CD Pipeline. See `these Azimuth docs <https://stackhpc.github.io/azimuth-config/deployment/automation/>`_ for more information.

It's preferable that most Day 2 ops be done via a `CD Pipeline <https://stackhpc.github.io/azimuth-config/deployment/automation/>`__.

sd109 marked this conversation as resolved.
Show resolved Hide resolved
Kayobe Config
==============
Ensure that your kayobe-config branch is up to date on |current_release_git_branch_name|.

Copy the kubeconfig found at `kubeconfig-capi-mgmt-<your-az-environment>.yaml` to your kayobe environment (e.g. `<your-skc-environment>/kolla/config/magnum/kubeconfig`. It is highly likely you'll want to add this file to ansible vault.
To configure the Magnum service with the Cluster API driver enabled, first ensure that your kayobe-config branch is up to date with |current_release_git_branch_name|.

Ensure that your magnum.conf has the following set:
Next, copy the CAPI management cluster's kubeconfig file into to your stackhpc-kayobe-config environment (e.g. ``<your-skc-environment>/kolla/config/magnum/kubeconfig``). This file must be Ansible vault encrypted.
sd109 marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: yaml
The following config should also be set in your stackhpc-kayobe-config environment:

.. code-block:: ini
:caption: magnum.conf

[nova_client]
endpoint_type = publicURL
sd109 marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: yaml
:caption: kolla/globals.yml

This is used to generate the application credential config injected into the tenant Kubernetes clusters, such that it is usable from within an OpenStack project, so you can't use the "internal API" end point here.
magnum_cluster_api_driver_enabled: true
sd109 marked this conversation as resolved.
Show resolved Hide resolved

Control Plane
==============
Ensure that the nodes (either controllers or dedicated network hosts) that you are running the magnum containers on have connectivity to the network on which your management cluster has a floating IP (so that the magnum containers can reach the IP listed in the kubeconfig).
To apply the configuration, run ``kayobe overcloud service reconfigure -kt magnum``.

Magnum Templates
================

sd109 marked this conversation as resolved.
Show resolved Hide resolved
`azimuth-images <https://github.com/stackhpc/azimuth-images>`__ builds the required Ubuntu Kubernetes images, and `capi-helm-charts <https://github.com/stackhpc/capi-helm-charts/blob/main/.github/workflows/test.yaml>`__ CI runs conformance tests on each image built.
Magnum Cluster Templates
========================

Magnum templates can be deployed using `openstack-config <https://github.com/stackhpc/openstack-config>`__. Typically, you would create a fork `<environment>-config` of this repository, move the resources defined in `examples/capi-templates-images.yml` into `etc/openstack-config/openstack-config.yml`, and then follow the instructions in the readme to deploy these.
The clusters deployed by the Cluster API driver make use of the Ubuntu Kubernetes images built in the `azimuth-images <https://github.com/stackhpc/azimuth-images>`_ repository and then use `capi-helm-charts <https://github.com/stackhpc/capi-helm-charts>`_ to provide the Helm charts which define the clusters based on these images. Between them, these two repositories have CI jobs which regularly build and test images and Helm charts for the latest Kubernetes versions. It is therefore important to update the cluster templates on each cloud regularly to make use of these new releases.
sd109 marked this conversation as resolved.
Show resolved Hide resolved

Magnum templates should be defined within an existing client-specific `openstack-config <https://github.com/stackhpc/openstack-config>`_ repository.

TODO: Add more info here once we decide how to manage template updates in openstack-config.
sd109 marked this conversation as resolved.
Show resolved Hide resolved
Loading