diff --git a/README.md b/README.md index d9586654..81790433 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,7 @@ running RHEL for computes (and maybe swift). Some links of interest: +- [Persistent Storage Guide](docs/user-guide/index.md) - [OpenStack Kubernetes Operators](https://github.com/openstack-k8s-operators/) - [Developer Docs](https://github.com/openstack-k8s-operators/dev-docs) - [User Docs](https://openstack-k8s-operators.github.io/openstack-operator/) diff --git a/docs/user-guide/cinder.md b/docs/user-guide/cinder.md new file mode 100644 index 00000000..6e958bc5 --- /dev/null +++ b/docs/user-guide/cinder.md @@ -0,0 +1,1409 @@ +# Cinder configuration guide + +The OpenStack Block Storage service (cinder) allows users to access block +storage devices through *volumes* to provide persistent storage in the Compute +instances and can also be used by the Image Service (glance) as a back-end to +store its images. + +This guide focuses on general concepts covered before but from the cinder +perspective as well as the configuration of the different cinder components as +day-0 and day-1 operations; that is, creating the manifest content with the +cinder configuration and configuring it once the API component is running. + +The Block Storage service (cinder) has three mandatory services (api, scheduler, +and volume), and one optional service (backup). + +All the cinder services are configured using the `cinder` section of the +`OpenStackControlPlane` manifest: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: +``` + +Within this `cinder` section there are a couple of options for the +cinder-operator that have effects on all the services -such as `enabled` and +`uniquePodNames`-, but most of the global configuration options are in the +`template` section within this `cinder` section, which includes sections -such +as `nodeSelector`, `preserveJobs`, `dbPurge`, etc. + +Service specific sections all hang right below the `template` section. We have +the `cinderAPI` section for the API, `cinderScheduler` for the scheduler, +`cinderBackup` for backups, and `cinderVolumes` for the volume backends. + +A rough view of this structure would be: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + + template: + + cinderAPI: + + cinderScheduler: + + cinderVolumes: + : + : + cinderBackup: + +``` + +Table of contents: +- [1. Terminology](#1-terminology) +- [2. Changes from older releases](#2-changes-from-older-releases) +- [3. Prepare transport protocol requirements](#3-prepare-transport-protocol-requirements) +- [4. Setting initial defaults](#4-setting-initial-defaults) +- [5. Configuring the API service](#5-configuring-the-api-service) +- [6. Configuring the scheduler service](#6-configuring-the-scheduler-service) +- [7. Configuring the volume service](#7-configuring-the-volume-service) +- [8. Configuring the backup service](#8-configuring-the-backup-service) +- [9. Automatic database cleanup](#9-automatic-database-cleanup) +- [10. Preserving jobs](#10-preserving-jobs) +- [11. Resolving hostname conflicts](#11-resolving-hostname-conflicts) + + +## 1. Terminology + +There are some terminology that is important to clarify when discussing cinder: + +* Storage back-end: Refers to a physical Storage System where the data for the + *volumes* is actually stored. +* Cinder driver: Is the code in cinder that knows how to talk to a Storage + back-end. Configured using the `volume_driver` option. +* Cinder back-end: Refers to a logical representation of the grouping of a + cinder driver with its configuration. This grouping is used to manage and + address the volumes present in a specific Storage back-end. The name of this + logical construct is defined using the `volume_backend_name` option. +* Storage pool: Logical grouping of volumes in a given Storage back-end. +* Cinder pool: The representation in Cinder of a Storage pool. +* Volume host: This usually refers to the way cinder uses to address volumes. + There are two different representations, short `@` and + full `@#`. + +## 2. Changes from older releases + +If you are familiar with previous Red Hat OpenStack Platform releases that were +deployed using TripleO, you will notice enhancements and improvements as well as +some differences such as: + +* Multiple volume back-ends: + * Very easy to deploy. + * Adding back-ends is fast and doesn’t affect other running back-ends. + * Removing back-ends is fast and doesn’t affect other running back-ends. + * Making configuration changes to one back-end doesn’t affect other back-ends. + * Each back-end can use its own vendor specific container image. No more + building custom images to hold dependencies from two drivers. +* Recommended deployment avoids inconsistent scheduling issues. +* No more pacemaker. OpenShift functionality has been leveraged to not need it. +* Easier way to debug service code when faced with difficult to resolve issues. + + +## 3. Prepare transport protocol requirements + +The OpenStack control plane services that use volumes, such as cinder volume and +cinder backup, require the OpenShift cluster’s support for them, as it needs to +run some daemons and kernel modules, such as `iscsid` and `multipathd`. + +These daemon and kernel modules must be available in all the OpenShift nodes +where the cinder volume and backup services could run, not only on the ones +where they are currently running. More information on where services run is +available in section 1.5 Restricting where services run. + +To ensure those daemons and kernel modules are running on the OCP hosts some +changes may be required on the OpenShift side using a `MachineConfig`. For +additional information on `MachineConfig` please refer to the [OpenShift +documentation](https://docs.openshift.com/container-platform/4.12/post_installation_configuration/machine-configuration-tasks.html). + +It is recommended to make these changes during the installation of the OpenShift +cluster or before deploying any of the OpenStack control plane services with the +`OpenStackControlPlane` manifest. + +There are complete examples of backend and transport protocols available, as +described in the [back-end examples section](#76-back-end-examples). + +--- + +> **🛈 Note:** This section’s only purpose is to serve as a guide to the topic +of transport protocol requirements for illustration purposes, but each vendor +may have specific recommendations on how to configure the storage transport +protocol to use against their storage system, so it is recommended to always +check with the vendor. + +--- + +> **⚠ Attention:** The node will reboot any time a `MachineConfig` is used to +make changes to OpenShift nodes\!\! OpenShift and OpenStack administrators may +be different, so please consult your OpenShift administrators before applying +any `MachineConfig` to ensure safety of the OpenShift workloads. + +--- + +> **⚠ Attention:** When using `nodeSelector` described later, remember to also use +*a `MachineConfigPool` and adapt the `MachineConfig` to use it as described in +*the [OpenShift +*documentation](https://docs.openshift.com/container-platform/4.12/post_installation_configuration/machine-configuration-tasks.html). + +> **⚠ Attention:** `MachineConfig` examples in the following sections are using +label `machineconfiguration.openshift.io/role: worker` instructing OpenShift to +apply those changes on `worker` nodes (`workers` is an automatically created +`MachineConfigPool`) This assumes we have an OpenShift cluster with 3 master +nodes and 3 `worker` nodes. If we are deploying in a 3 node OpenShift cluster +where all nodes are `master` and `worker` we need to change it to use `master` +instead (`master` is another automatically created `MachineConfigPool`). + +--- + +### 3.1. iSCSI + +Connecting to iSCSI volumes from the OpenShift nodes requires the iSCSI +initiator to be running, because the Linux Open iSCSI initiator doesn't +currently support network namespaces, so we must only run 1 instance of the +`iscsid` service for the normal OpenShift usage, the OpenShift CSI plugins, and +the OpenStack services. + +If we are not already running `iscsid` on the OpenShift nodes then we'll need to +apply a `MachineConfig` to the nodes that could be running cinder volume and +backup services. Here is an example that starts the `iscsid` service with the +default configuration options in **all** the OpenShift worker nodes: + +``` +apiVersion: machineconfiguration.openshift.io/v1 +kind: MachineConfig +metadata: + labels: + machineconfiguration.openshift.io/role: worker + service: cinder + name: 99-worker-cinder-enable-iscsid +spec: + config: + ignition: + version: 3.2.0 + systemd: + units: + - enabled: true + name: iscsid.service +``` + +For production deployments using iSCSI volumes we encourage setting up +multipathing, please look at the [multipathing section below](#multipathing) to +see how to configure it. + +The iSCSI initiator is automatically loaded on provisioned Data Plane nodes +where the Compute service is running, so no additional steps are required for +instances to access the volumes. + +### 3.2. FC + +Connecting FC volumes from the OpenShift nodes do not require any +`MachineConfig` to work, but production deployments using FC volumes should +always be set up to use multipathing. Please look at the [multipathing section +below](#multipathing) to see how to configure it. + +It is of utmost importance that all OpenShift nodes that are going to run cinder +volume and cinder backups have a HBA card. + +Node selectors can be used to determine what nodes can run cinder volume and +backup pods if our deployment doesn’t have HBA cards in all the nodes. Please +refer to the [Service placement +section](commonalities#5-restricting-where-services-run) for detailed +information on how to use node selectors. + +No additional steps are necessary in Data Plane nodes where the Compute service +is running for instances to have access to FC volumes. + +### 3.3. NVMe-oF + +Connecting to NVMe-oF volumes from the OpenShift nodes requires that the nvme +kernel modules are loaded on the OpenShift hosts. + +If we are not already loading the nvme modules on the OpenShift nodes where +volume and backup services are going to run, then we'll need to apply a +`MachineConfig` similar to this one that applies the change to **all** OpenShift +nodes: + +``` +apiVersion: machineconfiguration.openshift.io/v1 +kind: MachineConfig +metadata: + labels: + machineconfiguration.openshift.io/role: worker + service: cinder + name: 99-master-cinder-load-nvme-fabrics +spec: + config: + ignition: + version: 3.2.0 + storage: + files: + - path: /etc/modules-load.d/nvme_fabrics.conf + overwrite: false + mode: 420 + user: + name: root + group: + name: root + contents: + source: data:,nvme-fabrics%0Anvme-tcp +``` + +We are only loading the `nvme-fabrics` module because it takes care of loading +the transport specific modules (tcp, rdma, fc) as needed. + +For production deployments using NVMe-oF volumes we encourage using +multipathing. For NVMe-oF volumes OpenStack uses native multipathing, called +[ANA](https://nvmexpress.org/faq-items/what-is-ana-nvme-multipathing/). + +Once the OpenShift nodes have rebooted and are loading the `nvme-fabrics` module +we can confirm that the Operating System is configured and supports ANA by +checking on the host: + +``` +cat /sys/module/nvme_core/parameters/multipath +``` + +The nvme kernel modules are automatically loaded on provisioned Data Plane nodes +where the Compute service is running, so no additional steps are required for +instances to access the volumes. + +--- + +> **⚠ Attention:** Even though ANA doesn't use the Linux Multipathing Device +Mapper the current OpenStack code requires `multipathd` on compute nodes to be +running for the Compute nodes to be able to use multipathing when connecting +volumes to instances, so please remember to follow the [multipathing +section](#multipathing) to be able to use it in the control plane nodes. + +#### OCP BUG #32629 + +In some deployments the OpenShift image has the NVMe `hostid` and `hostnqn` +hardcoded, so it ends up being the same in all the OpenShift nodes used for the +OpenStack control plane, which is problematic for Cinder when connecting +volumes. + +Bugs: +- [Red Hat bug #34629](https://issues.redhat.com/browse/OCPBUGS-34629) +- [Upstream issue](https://github.com/openshift/os/issues/1519) +- [PR](https://github.com/openshift/os/pull/1520) + +To resolve this issue we can use the following `MachineConfig` that fixes the +issue by recreating both files when the `hostid` doesn't match the system-uuid +of the machine it is running on. + +``` +apiVersion: machineconfiguration.openshift.io/v1 +kind: MachineConfig +metadata: + labels: + component: fix-nvme-ids + machineconfiguration.openshift.io/role: worker + service: cinder + name: 99-worker-cinder-fix-nvme-ids +spec: + config: + Systemd: + Units: + - name: cinder-nvme-fix.service + enabled: true + Contents: | + [Unit] + Description=Cinder fix nvme ids + + [Service] + Type=oneshot + RemainAfterExit=yes + Restart=on-failure + RestartSec=5 + ExecStart=bash -c "if ! grep $(/usr/sbin/dmidecode -s system-uuid) /etc/nvme/hostid; then /usr/sbin/dmidecode -s system-uuid > /etc/nvme/hostid; /usr/sbin/nvme gen-hostnqn > /etc/nvme/hostnqn; fi" + + [Install] + WantedBy=multi-user.target + ignition: + version: 3.2.0 +``` + +--- + +### 3.4. Multipathing + +Cinder back-ends using iSCSI and FC transport protocols (and NVMe-oF due to an +OpenStack limitation) in production should always be configured using +multipathing to provide additional resilience and optionally additional +throughput. + +Setting up multipathing on OpenShift nodes requires a `MachineConfig` that +creates the `multipath.conf` file and starts the service. + +A basic `multipath.conf` file would be: + +``` +defaults { + user_friendly_names no + recheck_wwid yes + skip_kpartx yes + find_multipaths yes +} + +blacklist { +} +``` + +Here is an example of how that same configuration file could be written in +**all** OpenShift worker nodes and then make the `multipathd` service start on +*boot: + +``` +apiVersion: machineconfiguration.openshift.io/v1 +kind: MachineConfig +metadata: + labels: + machineconfiguration.openshift.io/role: worker + service: cinder + name: 99-master-cinder-enable-multipathd +spec: + config: + ignition: + version: 3.2.0 + storage: + files: + - path: /etc/multipath.conf + overwrite: false + mode: 384 + user: + name: root + group: + name: root + contents: + source: data:,defaults%20%7B%0A%20%20user_friendly_names%20no%0A%20%20recheck_wwid%20yes%0A%20%20skip_kpartx%20yes%0A%20%20find_multipaths%20yes%0A%7D%0A%0Ablacklist%20%7B%0A%7D + systemd: + units: + - enabled: true + name: multipathd.service +``` + +Configuring the cinder services to use multipathing is usually done using the +`use_multipath_for_image_xfer` configuration option in all the backend sections +(and in the `[DEFAULT]` section for the backup service), but in OpenStack on +OpenShift deployments there’s no need to worry about it, because that's the +default. So as long as we don't override it by setting +`use_multipath_for_image_xfer = false`. + +The multipathing daemon is automatically loaded on provisioned Data Plane nodes. + +## 4. Setting initial defaults + +There are some configuration options in the service that are only used once +during the database creation process, so they should be configured the first +time we enable the cinder service, and they have to be defined at the top +`customServiceConfig` or they won’t have any effect and defaults will be used. + +The configuration options that are used during the database creations are the following: + +* `quota_volumes`: Number of volumes allowed per project. + Integer value. + Defaults to `10` +* `quota_snapshots`: Number of volume snapshots allowed per project. + Integer value. + Defaults to `10` +* `quota_consistencygroups`: Number of consistency groups allowed per project. + Integer value. + Defaults to `10` +* `quota_groups`: Number of groups allowed per project. + Integer value. + Defaults to `10` +* `quota_gigabytes` : Total amount of storage, in gigabytes, allowed for volumes and snapshots per project. + Integer value. + Defaults to `1000` +* `no_snapshot_gb_quota`: Whether snapshots count against gigabyte quota. + Bboolean value. + Defaults to `false` +* `quota_backups`: Number of volume backups allowed per project. + Integer value. + Defaults to `10` +* `quota_backup_gigabytes`: Total amount of storage, in gigabytes, allowed for backups per project. + Integer value. + Defaults to `1000` +* `per_volume_size_limit`: Max size allowed per volume, in gigabytes. + Integer value. + Defaults to `-1` (no limit). +* `default_volume_type`: Default volume type to use. More details on the CUSTOMIZING PERSISTENT STORAGE guide. + String value. + Defaults to `__DEFAULT__` (automatically created on installation). + +Changes are still possible once the database has been created, but the +`openstack` client needs to be used to modify any of these values, and changing +these configuration options in a snippet will have no impact on the deployment. +Please refer to setting the quota on the customizing persistent storage. + +Here’s an example with some of these options set in the manifest: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + customServiceConfig: | + [DEFAULT] + quota_volumes = 20 + quota_snapshots = 15 +``` + +## 5. Configuring the API service + +The cinder API offers a REST API interface for all external interaction with the +service for both users and other OpenStack services. + +The API component is relatively simple to configure, because it doesn’t need +additional networks, doesn’t need sensitive information to function, can run on +any OCP node, and doesn’t need any configuration snippet to operate. + +From a configuration point of view the only thing that needs to be configured is +the internal OpenShift service’s load balancer as previously described in +Creating the control plane from the Deploying OpenStack Services on OpenShift. + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + cinderAPI: + override: + service: + internal: + metadata: + annotations: + metallb.universe.tf/address-pool: internalapi + metallb.universe.tf/allow-shared-ip: internalapi + metallb.universe.tf/loadBalancerIPs: 172.17.0.80 + spec: + type: LoadBalancer +``` + +The default REST API inactivity timeout is 60 seconds, changing this value is +documented in [Setting API timeouts](commonalities.md#8-setting-api-timeouts). + +### 5.1. Setting the number of replicas + +For the cinder API the default number of replicas is `1`, but the recommendation +for production is to run multiple instances simultaneously in Active-Active +mode. + +This can be easily achieved by setting `replicas: 3` in the `cinderAPI` section +of the configuration: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + cinderAPI: + replicas: 3 +``` + +### 5.2. Setting cinder API options + +Like all the cinder components, cinder API inherits the configuration snippet +from the top `customServiceConfig` section defined under `cinder`, but it also +has its own `customServiceConfig` section under the `cinderAPI` section. + +There are multiple configuration options that can be set in the snippets under +the `[DEFAULT]` group, but the most relevant ones are: + +* `debug`: When set to `true` the logging level will be set to `DEBUG` instead + of the default `INFO` level. Debug log levels for the API can also be + dynamically set without restart using the dynamic log level API functionality. + Boolean value. + Defaults to `false` +* `api_rate_limit`: Enables or disables rate limit of the API. + Boolean value. + Defaults to `true` +* `osapi_volume_workers`: Number of workers for the cinder API Component. + Integer value. + Defaults to the number of CPUs available. +* `osapi_max_limit`: Maximum number of items that a collection resource returns + in a single response. + Integer value. + Defaults to 1,000. + +As an example, here’s how we would enable debug logs globally for all the cinder +components, API included, and set the number of API workers to `3` specifically +for the API component. + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + customServiceConfig: | + [DEFAULT] + debug = true + cinderAPI: + customServiceConfig: | + [DEFAULT] + osapi_volume_workers = 3 +``` + +### 5.3. Creating custom access policies + +Cinder, like other OpenStack services, has its own sensible default access +policies for its API; this is necessary to restrict the operations users can +perform. + +To override the default values of these policies a YAML formatted file can be +provided to the API component. This file only needs to have the specific +policies that want to be changed from their default values, there’s no need to +provide all the policies in the file for it to be valid. + +The default location where cinder API will look for the policy file is +`/etc/cinder/policy.yaml` if a different location is used then the `policy_file` +option of the `oslo_policy` section of the cinder configuration must be set to +match. + +A complete list of available policies in cinder as well as their default values +can be found in [the project +documentation](https://docs.openstack.org/cinder/2023.1/configuration/block-storage/policy.html). + +Here’s an example of how to change the policy to allow any user to force delete +snapshots. For illustration purposes we’ll use a non-default location for the +policy file. + +First let’s see what the `ConfigMap` with the custom cinder policy would look +like: + +``` +apiVersion: v1 +kind: ConfigMap +metadata: + name: my-cinder-conf + namespace: openstack +data: + policy: | + "volume_extension:snapshot_admin_actions:force_delete": "rule:xena_system_admin_or_project_member" +``` + +And now leverage the `extraMounts` feature using a non standard location for the +file. + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +spec: + cinder: + template: + cinderAPI: + customServiceConfig: | + [oslo_policy] + policy_file=/etc/cinder/api/policy.yaml + extraMounts: + - extraVol: + - volumes: + - name: policy + projected: + sources: + - configMap: + name: my-cinder-policy + items: + - key: policy + path: policy.yaml + mounts: + - mountPath: /etc/cinder/api + name: policy + readOnly: true + propagation: + - CinderAPI +``` + +For additional information on `extraMounts`, including an example of how to +mount the policy in the default location, please refer to section [Using +external data in the +services](commonalities.md#6-using-external-data-in-the-services). + +## 6. Configuring the scheduler service + +The cinder Scheduler is responsible for making decisions such as selecting +where (as in what cinder back-end) new volumes should be created, whether +there’s enough free space to perform an operation (e.g. creating a snapshot) or +not, or deciding where an existing volume should be moved to on some specific +operations. + +The Scheduler component is the simplest component to configure, because it +doesn’t need any changes to its defaults to run. + +### 6.1. Setting the number of replicas + +The cinder Scheduler component can run multiple instances in Active-Active mode, +but since it follows an eventual consistency model, running multiple instances +makes it considerably harder to understand its behavior when doing operations. + +The recommendation is to use a single instance for the Scheduler unless your +specific deployment needs find that this becomes a bottleneck. Increasing the +number of instances can be done at any time without disrupting the existing +instances. + +### 6.2. Setting service down detection timeout + +On scheduling operations only services that are up and running will be taken +into account, and the rest will be ignored. + +To detect services that are up a database heartbeat is used, and then the time +of this heartbeat is used to see if it has been too long since the service did +the heartbeat. + +The options used to configure the service down detection timeout are: + +* `report_interval`: Interval, in seconds, between components (scheduler, + volume, and backup) reporting `up` state in the form of a heartbeat through + the database. + Integer value. + Defaults to `10` +* `service_down_time`: Maximum time since last check-in (heartbeat) for a + component to be considered up. + Integer value. + Defaults to `60` + +It is recommended to define these options at the top cinder +`customServiceConfig` so all services have a consistent heartbeat interval and +because the service down time is used by multiple services, not only the +scheduler. + +Here’s an example of doubling the default reporting intervals: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + customServiceConfig: | + [DEFAULT] + report_interval = 20 + service_down_time = 120 +``` + +### 6.3. Setting the stats reporting interval + +Many scheduling operations require that the cinder schedulers know details about +the storage back-end, so cinder volumes and backups periodically report stats to +the schedulers that include, but are not limited to, used space, available +space, and capabilities. + +Having frequent stats updates is most important when a storage back-end is close +to its full capacity, as we may get failures on operations due to lack of space +that theoretically would fit. + +The configuration option for volumes is called `backend_stats_polling_interval` +and dictates the time in seconds between the cinder volume requests for usage +statistics from the Storage back-end. The default value is `60`. This option +must be set in the `[DEFAULT]` section. + +The equivalent configuration option for backups is called +`backup_driver_stats_polling_interval` and has the same default of `60`. + +--- + +> **⚠ Attention:** Generating these usage statistics is expensive for some +backends, so setting this value too low may adversely affect performance. + +--- + +Because these stats are generated by the cinder volume and cinder backup, the +configuration option must be present in their respective snippet and not the +scheduler. This can be achieved by setting it globally in the top +`customServiceConfig` or individually on each of the `cinderVolumes` or in the +`cinderBackup` depending on the granularity we want. + +Example of setting it globally to double the default value for backups and +volumes: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + customServiceConfig: | + [DEFAULT] + backend_stats_polling_interval = 120 + backup_driver_stats_polling_interval = 120 +``` + +Example of setting it on a per back-end basis: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + cinderBackup: + customServiceConfig: | + [DEFAULT] + backup_driver_stats_polling_interval = 120 + < rest of the config > + cinderVolumes: + nfs: + customServiceConfig: | + [DEFAULT] + backend_stats_polling_interval = 120 + < rest of the config > +``` + +### 6.4. Setting other cinder Scheduler options + +Like all the cinder components, cinder Scheduler inherits the configuration +snippet from the top `customServiceConfig` section defined under `cinder`, but +it also has its own `customServiceConfig` section under the `cinderScheduler` +section. + +There are multiple configuration options that can be set in the snippets under +the `[DEFAULT]` group, but the most relevant ones are: + +* `debug`: When set to `true` the logging level will be set to `DEBUG` instead + of the default `INFO` level. Debug log levels for the Scheduler can also be + dynamically set without restart using the dynamic log level API functionality. + Boolean value. + Defaults to `false` +* `scheduler_max_attempts`: Maximum number of attempts to schedule a volume. + Integer value. + Defaults to `3` +* `scheduler_default_filters`: Filter class names to use for filtering hosts + when not specified in the request. List of available filters can be found in + the [upstream project + documentation](https://docs.openstack.org/cinder/2023.1/configuration/block-storage/scheduler-filters.html). + Comma separated list value. + Defaults to `AvailabilityZoneFilter,CapacityFilter,CapabilitiesFilter` +* `scheduler_default_weighers`: Weigher class names to use for weighing hosts. + List of available weighers can be found in the [upstream project + documentation](https://docs.openstack.org/cinder/2023.1/configuration/block-storage/scheduler-weights.html). + Comma separated list value. + Defaults to `CapacityWeigher`. +* `scheduler_weight_handler`: Handler to use for selecting the host/pool after + weighing. Available values are: + `cinder.scheduler.weights.OrderedHostWeightHandler` that selects the first + host from the list of hosts that passed the filters, and + `cinder.scheduler.weights.stochastic.stochasticHostWeightHandler` which gives + every pool a chance to be chosen where the probability is proportional to each + pools’ weight. + String value. + Defaults to `cinder.scheduler.weights.OrderedHostWeightHandler`. + +As an example, here’s how we would enable debug logs globally for all the cinder +components, Scheduler included, and set the number of scheduling attempts to 2 +specifically in the Scheduler component. + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + customServiceConfig: | + [DEFAULT] + debug = true + cinderScheduler: + customServiceConfig: | + [DEFAULT] + scheduler_max_attempts = 3 +``` + +## 7. Configuring the volume service + +Cinder Volume is responsible for managing operations related to Volumes, +Snapshots, and Groups (previously known as Consistency Groups\`); such as +creating, deleting, cloning, snapshotting, etc. + +The component requires access to the Storage back-end I/O network (`storage`) as +well as its management network (`storageMgmt`) in the `networkAttachments`. Some +operations, such as creating an empty volume or a snapshot, do not require any +data I/O between the cinder Volume pod and the Storage back-end, but there are +other operations, such as migrating data from one storage back-end to a +different one, that requires the data to pass through the Volume pod. + +Configuring volume services is done under the `cinderVolumes` section, and most +of the time requires using `customServiceConfig`, `customServiceConfigSecrets`, +`networkAttachments`, `replicas`, although in some cases even the +`nodeSelector`. + +Make sure you’ve reviewed section 1.4. Setting service configuration options to +understand what configuration options should go in the `customServiceConfig` and +which ones in the `customServiceConfigSecrets`. + +### 7.1. Back-ends + +In OpenStack on OpenShift each cinder back-end should have its own entry in the +`cinderVolumes` section, that way each cinder back-end will run in its own +pod. This not only removes a good number of limitations but also brings in a lot +of benefits: + +* Increased isolation +* Adding back-ends is fast and doesn’t affect other running back-ends. +* Removing back-ends is fast and doesn’t affect other running back-ends. +* Making configuration changes to a back-end doesn’t affect other back-ends. +* Automatically spreads the Volume pods into different nodes. + +Each cinder back-end uses a storage transport protocol to access the data in the +volumes, and each of these protocols have their own requirements, as described +in section [prepare transport protocol +requirements](#3-prepare-transport-protocol-requirements), but this is usually +also documented by the vendor. + +--- + +> **🛈 Note:** In older OpenStack releases the only deployment option was to run +all the back-ends together in the same container, but this is no longer +recommended for OpenStack on OpenShift, and using an independent pod for each +configured back-end. + +--- + +> **⚠ Attention:** In OpenStack on OpenShift no back-end is deployed by +default, so there will be no cinder Volume services running unless a back-end is +manually configured in the deployment. + +--- + +### 7.2. Setting the number of replicas + +The cinder volume component cannot currently run multiple instances in +Active-Active mode; all deployments must use `replicas: 1`, which is the default +value, so there’s no need to explicitly specify it. + +Cinder volume leverages the OpenShift functionality to always maintain one pod +running to serve volumes. + +### 7.3. Setting cinder Volume options + +A volume back-end configuration snippet must have its own configuration group +and cannot be configured in the `[DEFAULT]` group. + +As mentioned before, each back-end has its own configuration options that are +documented by the storage vendor, but there are some common options for all +cinder back-ends. + +* `backend_availability_zone`: Availability zone of this cinder back-end. Can be + set in the `[DEFAULT]` section using the `storage_availability_zone` option. + String value. + Defaults to the value of `storage_availability_zone` which in turn defaults to `nova`. +* `volume_backend_name`: Cinder back-end name for a given driver implementation. + String value. + No default value. +* `volume_driver`: Driver to use for volume creation in the form of python + namespace for the specific class. + String value. +* `enabled_backends`: A list of backend names to use. These backend names should + be backed by a unique `[CONFIG]` group with its options. + Comma separated list of string values. + Defaults to the name of the section with a `volume_backend_name` option. +* `image_conversion_dir`: Directory used for temporary storage during image + conversion. This option is useful when replacing image conversion location + with a remote NFS location. + String value. + Defaults to `/var/lib/cinder/conversion` +* `backend_stats_polling_interval:` Time in seconds between the cinder volume + requests for usage statistics from the Storage back-end. Be aware that + generating usage statistics is expensive for some backends, so setting this + value too low may adversely affect performance. + Integer value. + Defaults to `60` + +In OpenStack on OpenShift there is no need to configure `enabled_backends` when +running a single pod per back-end as it will be automatically configured for us. +So this snippet in the `customServiceConfig` section: + +``` +[iscsi] +volume_backend_name = myiscsi +volume_driver = cinder.volume.drivers... +``` + +Is equivalent to its more verbose counterpart: + +``` +[DEFAULT] +enabled_backends = iscsi +[iscsi] +volume_backend_name = myiscsi +volume_driver = cinder.volume.drivers... +``` + +--- + +> **⚠ Attention:** You may be aware of the `host` and `backend_host` +configuration options. We recommend not using them unless strictly necessary, +such as when adopting an existing deployment. + +--- + +As an example, here’s how we would enable debug logs globally for all the cinder +components, Volume included, and set the backend name and volume driver for a +Ceph back-end: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + customServiceConfig: | + [DEFAULT] + debug = true + cinderVolumes: + ceph: + customServiceConfig: | + [DEFAULT] + volume_backend_name = ceph + volume_driver = cinder.volume.drivers.rbd.RBDDriver +``` + +### 7.4. Configuring multiple back-ends + +You can deploy multiple back-ends for the Block Storage service (cinder), each +will use its own pod, so they are independent regarding updating, changing +configuration, node placement, container image to be used, etc. + +Configuring multiple back-ends is as easy as adding another entry to the +`cinderVolumes` section. + +For example we could have two independent back-ends, one for iSCSI and another +for NFS:: + +``` +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + cinderVolumes: + nfs: + networkAttachments: + - storage + - storageMgmt + customServiceConfigSecrets: + - cinder-volume-nfs-secrets + customServiceConfig: | + [nfs] + volume_backend_name=nfs + iSCSI: + networkAttachments: + - storage + - storageMgmt + customServiceConfig: | + [iscsi] + volume_backend_name=iscsi +``` + +### 7.5. Configuring an NFS back-end + +The Block Storage service (cinder) can be configured with a generic NFS back-end +to provide an alternative storage solution for volumes as well as backups. + +As with most back-ends the snippet with sensitive information would be stored in +a secret (that needs to be created in OpenShift on its own with `oc create` or +`oc apply`) and referenced in the `customServiceConfigSecrets` and the rest of +the configuration in the `customServiceConfig`. + +The single pod per back-end should still be the norm for NFS shares, so to +configure multiple shares a separate back-end should be added with its own +`Secret` referenced in `customServiceConfigSecrets`. + +--- + +> **⚠ Attention:** Use a certified third-party NFS driver when using OpenStack +Services on OpenShift in a production environment. The generic NFS driver is not +recommended for a production environment. + +--- + +> **🛈 Note:** *This section discussed using NFS as a cinder volume back-end, +not using an external NFS share for image conversion. For that purpose the +`extraMounts` feature should be used, as described in section [using external +data](commonalities.md#6-using-external-data-in-the-services). + +--- + +#### Supported NFS storage + +* For production deployments we recommends that you use a vendor specific storage back-end and driver. We +don't recommend using NFS storage that comes from the generic NFS back end for +production, because its capabilities are limited compared to a vendor NFS +system. For example, the generic NFS back-end does not support features such as +volume encryption and volume multi-attach. + +* For Block Storage (cinder) and Compute (nova) services, you must use NFS +version 4.0 or later. OpenStack on OpenShift does not support earlier versions +of NFS. + +#### Unsupported NFS configuration + +* OpenStack on OpenShift does not support the NetApp NAS secure feature, because +it interferes with normal volume operations, so these must be disabled in the +`customServiceConfig` in the specific backend configuration, as seen in the +later example, by setting: + + * `nas_secure_file_operation=false` + * `nas_secure_file_permissions=false` + +* Do not configure the `nfs_mount_options` option as the default value is the +most suitable NFS options for OpenStack on OpenShift environments. If you +experience issues when you configure multiple services to share the same NFS +server, contact Red Hat Support. + +#### Limitations when using NFS shares + +* Instances that have a swap disk cannot be resized or rebuilt when the back-end +is an NFS share. + +--- + +> **Attention:** Use a vendor specific NFS driver when using OpenStack on +OpenShift in a production environment. The generic NFS driver is not recommended +for a production environment. + +--- + +#### NFS Sample configuration + +In this example we are just going to configure cinder volume to use the same NFS +system. + +Volume secret: + +``` +apiVersion: v1 +kind: Secret +metadata: + name: cinder-volume-nfs-secrets [1] +type: Opaque +stringData: + cinder-volume-nfs-secrets: | + [nfs] + nas_host=192.168.130.1 + nas_share_path=/var/nfs/cinder +``` + +The `OpenStackControlPlane` configuration: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + cinderVolumes: + nfs: + networkAttachments: [2] + - storage + customServiceConfig: | + [nfs] + volume_backend_name=nfs + volume_driver=cinder.volume.drivers.nfs.NfsDriver + nfs_snapshot_support=true + nas_secure_file_operations=false + nas_secure_file_permissions=false + customServiceConfigSecrets: + - cinder-volume-nfs-secrets [1] +``` + +\[1\]: The name in the `Secret` and the `OpenStackControlPlane` must match. + +\[2\]: Note how the `storageMgmt` network is not present, that’s because in +vanilla NFS there is no management interface, as all operations are done +directly through the data path. + +### 7.6. Back-end examples + +To make things easier the `cinder-operator` code repository includes a good +number of samples to illustrate different configurations. + +Provided back-end samples use the `kustomize` tool which can be executed +directly using `oc kustomize` to get a complete `OpenStackControlPlane` sample +file. + +For example to get the iSCSI backend sample configuration into a +`~/openstack-deployment.yaml` file we can run: + +```bash +oc kustomize \ + https://github.com/openstack-k8s-operators/cinder-operator.git/config/samples/backends/lvm/iscsi?ref=main \ + > ~/openstack-deployment.yaml +``` + +That will generate not only the `OpenStackControlPlane` object but also the +required `MachineConfig` objects for iSCSI, Multipathing, and even the creation +of an LVM VG. + +Some of the samples may require additional steps, like adding a label to a node +for the LVM samples or deploying a Ceph cluster for the Ceph example, or Pure +that needs a custom container. Please look at the storage link as well as the +specific samples for more information. + +All examples using Swift as the backup back-end, with the exception of the +Ceph sample that uses Ceph. + +Currently [available samples are](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends): + +- [Ceph](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/ceph) +- [NFS](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/nfs) +- [LVM](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/lvm) + - [Using iSCSI](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/lvm/iscsi) + - [Using NVMe-TCP](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/lvm/nvme-tcp) +- [HPE](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/hpe) + - [3PAR using iSCSI](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/hpe/3par/iscsi) + - [3PAR using FC](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/hpe/3par/fc) +- [NetApp](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/netapp/ontap) + - [ONTAP using iSCSI](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/netapp/ontap/iscsi) + - [ONTAP using FC](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/netapp/ontap/fc) + - [ONTAP using NFS](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/netapp/ontap/nfs) +- [Pure Storage](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/pure) + - [Using iSCSI](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/pure/iscsi) + - [Using FC](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/pure/fc) + - [Using NVMe-RoCE](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/pure/nvme-roce) +- [Dell PowerMax iSCSI](https://github.com/openstack-k8s-operators/cinder-operator/tree/main/config/samples/backends/dell/powermax/iscsi) + +## 8. Configuring the backup service + +The Block Storage service (cinder) provides an optional backup service that you +can deploy in your OpenStack on OpenShift environment. + +You can use the Block Storage backup service to create and restore full or +incremental backups of your Block Storage volumes. + +A volume backup is a persistent copy of the contents of a Block Storage volume +that is saved to a backup repository. + +Configuring cinder backup is done under the `cinderBackup` section, and most of +the time requires using `customServiceConfig`, `customServiceConfigSecrets`, +`networkAttachments`, `replicas`, although in some cases even the +`nodeSelector`. + +### 8.1. Back-ends + +You can use Ceph Storage RBD, the Object Storage service (swift), NFS, or S3 as +your backup back end, and there are no vendor containers necessary for them. + +The Block Storage backup service can back up volumes on any back end that the +Block Storage service (cinder) supports, regardless of which back end you choose +to use for your backup repository. + +Only one back-end can be configured for backups, unlike in cinder volume where +multiple back-ends can be configured and used. + +Even though the backup back-ends don’t have transport protocol requirements that +need to be run on the OpenShift node, like the volume back-ends do, the pods are +still affected by those because the backup pods need to connect to the volumes. +Refer to section [preparing transport protocol +requirements](#3-prepare-transport-protocol-requirements) section to ensure the +nodes are properly configured. + +### 8.2. Setting the number of replicas + +The cinder backup component can run multiple instances in Active-Active mode, +and that’s the recommendation. + +As explained earlier this is achieved by setting `replicas` to a value greater +than `1`. + +The default value is `0`, so it always needs to be explicitly defined for it to +be defined. + +Example: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +spec: + cinder: + template: + cinderBackup: + replicas: 3 +``` + +### 8.3. Setting cinder Backup options + +Like all the cinder components, cinder Backup inherits the configuration snippet +from the top `customServiceConfig` section defined under `cinder`, but it also +has its own `customServiceConfig` section under the `cinderBackup` section. + +The backup back-end configuration snippet must be configured in the `[DEFAULT]` +group. + +Each back-end has its own configuration options that are documented in their +respective sections, but here are some common options to all the drivers. + +* `debug`: When set to `true` the logging level will be set to `DEBUG` instead + of the default `INFO` level. Debug log levels for the Scheduler can also be + dynamically set without restart using the dynamic log level API functionality. + Boolean value. + Defaults to `false` + +* `backup_service_inithost_offload`: Offload pending backup delete during backup + service startup. If set to `false`, the backup service will remain down until + all pending backups are deleted. + Boolean value. + Defaults to `true` + +* `backup_workers`: Number of processes to launch in the backup pod. Improves + performance with concurrent backups. + Integer value + Defaults to `1` + +* `backup_max_operations`: Maximum number of concurrent memory, and possibly + CPU, heavy operations (backup and restore) that can be executed on each pod. + The number limits all workers within a pod but not across pods. Value of 0 + means unlimited. + Integer value. + Defaults to `15` + +* `backup_native_threads_pool_size`: Size of the native threads pool used for + the backup data related operations. Most backup drivers rely heavily on this, + it can be increased for specific drivers that don’t. + Integer value. + Defaults to `60` + +As an example here’s how we could enable debug logs, double the number of +processes and increase the maximum number of operations per pod to `20`: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +spec: + cinder: + template: + customServiceConfig: | + [DEFAULT] + debug = true + cinderBackup: + customServiceConfig: | + [DEFAULT] + backup_workers = 2 + backup_max_operations = 20 +``` + +## 9. Automatic database cleanup + +The Block Storage (cinder) service does what’s called a soft-deletion of +database entries, this allows some level of auditing of deleted resources. + +These deleted database rows will grow endlessly if not purged. OpenStack on +OpenShift automatically purges database entries marked for deletion for a set +number of days. By default, records are marked for deletion for 30 days. You can +configure a different record age and schedule for purge jobs. + +Automatic DB purging is configured using the `dbPurge` section under the +`cinder` section, which has 2 fields: `age` and `schedule`. + +Here is an example to purge the database every `20` days just after midnight: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + dbPurge: + age: 20 [1] + schedule: 1 0 * * 0 [2] +``` + +\[1\]: The number of days a record has been marked for deletion before it is +purged. The default value is 30 days. The minimum value is 1 day. + +\[2\]: The schedule of when to run the job in a `crontab` format. The default +value is `1 0 * * *`. + + +## 10. Preserving jobs + +The Block Storage service requires maintenance operations that are run +automatically, some are one-off and some periodic. These operations are run +using OpenShift Jobs. + +Administrators may want to check the logs of these operations, which won’t be +possible if the Jobs and their pods are automatically removed on completion; for +this reason there is a mechanism to stop the automatic removal of Jobs. + +To preserve pods for cinder we use the `preserveJob` field like this: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + preserveJobs: true +``` + +## 11. Resolving hostname conflicts + +Most storage back-ends in cinder require the hosts that connect to them to have +unique hostnames, as these hostnames are used to identify the permissions and +the addresses (iSCSI initiator name, HBA WWN and WWPN, etc). + +Because we are deploying in OpenShift the hostnames that the cinder volume and +cinder backup will be reporting are not the OpenShift hostnames but the pod +names instead. + +These pod names are formed using a predetermined template: + +* For volumes: `cinder-volume--0` +* For backups: `cinder-backup-` + +If we use the same storage backend in multiple deployments we may end up not +honouring this unique hostname requirement, resulting in many operational +problems. + +To resolve this, we can request the installer to have unique pod names, and +hence unique hostnames using a field called `uniquePodNames`. + +When `uniquePodNames` are set to `true` a short hash will be added to these pod +names, which will resolve hostname conflicts. + +Here is an example requesting unique podnames/hostnames: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + uniquePodNames: true +``` diff --git a/docs/user-guide/commonalities.md b/docs/user-guide/commonalities.md new file mode 100644 index 00000000..7e7cbb74 --- /dev/null +++ b/docs/user-guide/commonalities.md @@ -0,0 +1,875 @@ +# General concepts + +This section covers some aspects common to the different storage services, and +they are the base for most of the topics covered in the `cinder` specific guide, +that assumes these are well understood. + +Table of Contents: + +- [1. Storage Back-end](#1-storage-back-end) +- [2. Enabling a service](#2-enabling-a-service) +- [3. Scaling services](#3-scaling-services) +- [4. Setting service configuration options](#4-setting-service-configuration-options) +- [5 Restricting where services run](#5-restricting-where-services-run) +- [6 Using external data in the services](#6-using-external-data-in-the-services) +- [7 Restricting resources used by a service](#7-restricting-resources-used-by-a-service) +- [8 Setting API timeouts](#8-setting-api-timeouts) +- [9 Storage networking](#9-storage-networking) +- [10 Using other container images](#10-using-other-container-images) + +## 1. Storage Back-end + +All *persistent* storage services need to store data somewhere, and that is +usually in a remote storage solution or back-end. OpenStack services support the +different types of back-ends using their own specific drivers. + +These storage back-end solutions are usually vendor specific and are only +supported by one or two of the *persistent* storage services, with the exception +of Ceph Storage, that can serve as a back-end for all four types of +*persistent* storage available in OpenStack. + +It is beyond the scope of this guide to go into all the benefits that a +clustered storage solution like Ceph provides (please refer to the [Red Hat Ceph +Storage 7 Architecture +Guide](https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/7/html/architecture_guide/index) +for additional information), but it is worth mentioning that when using Ceph in +OpenStack on OpenShift we get some additional benefits beyond its inherent +scaling and redundancy, because OpenStack services such as Block Storage, Image, +and Compute have specific optimizations and behaviors to make the most out of +having Ceph as the common back-end. + +When using Ceph for Object Storage the swift service will not be deployed and +Ceph will serve as a direct stand-in, and OpenStack services will access it +using a common interface. + +Please refer to the [Configure Cinder with Ceph backend](../ceph_backend.md) +guide for additional information on how to use Ceph in OpenStack on OpenShift. + +### Certification + +To promote the use of best practices and provide a path for improved +supportability and interoperability, Red Hat has a certification process for +OpenStack back-ends, but this project in itself doesn't have them, so +additional work may be necessary to deploy specific back-ends. + +### Transport protocols + +Different back-ends available in the storage services may use different storage +transport protocols, and these protocols may or may not be enabled by default +at the Operating System level on deployments. Additional steps are necessary +for transport protocols that are not enabled by default. + +The recommendation is to install all the transport protocol requirements as a +day-0 operation during the installation of OpenShift or once it is +installed and we haven’t started deploying additional operators, as these +requirements usually require rebooting all the OpenShift nodes, which can be +a slow process if there are already additional pods running. + +## 2. Enabling a service + +There are two parts to enabling a service, first is indicating to the installer +that we want the service to be deployed and the second is to have a `replicas` +value greater than `0`. + +To tell the installer that the service must be deployed, which will create a +database, keystone entries and other resources necessary for the deployment, we +use the field called `enabled: true`. + +Most storage services (cinder, glance, swift) are enabled by default, so we +don’t need to specify it, but the Shared File Systems service (manila) defaults +to `false`, so it needs to be explicitly set for it to be enabled. + +--- + +> **⚠ Attention:** Changing `enabled` to `false` will delete the service +database. More on this topic on [Scrubbing a Service of the Performing Cinder +operations guide](operations.md). + +--- + +Most components within a service have a `replicas` value set to `1`, so they +will be automatically deployed once `enabled` is set to `true`. For some other +services the default value for `replicas` is `0`, so we need to change them +explicitly; for example for `cinderBackup` and elements in the `cinderVolumes` +field the default is `0`. + + +## 3. Scaling services + +Most OpenStack components support being deployed in Active-Active High +Availability mode, and the way to do this is to set `replicas` to a number +greater than `1`. + +There is no predefined value for the `replicas` field, and we find different +defaults based on the component specifics: + +* Components that have the number of replicas set to a positive number by + default. These are services that don’t require additional configuration and + can run with their defaults. E.g. cinder api and cinder scheduler have + `replicas: 1` by default. + +* Components that have the number of replicas set to `0`. These are optional + components or components that need a specific configuration to work (e.g. + cinder backup). + +* Components that have a default value greater than `1`. + +When we configure a component to work as Active-Active, by setting its +`replicas` to a number greater than `1`, the pods will be distributed according +to the [OpenShift affinity +rules](https://docs.openshift.com/container-platform/4.16/nodes/scheduling/nodes-scheduler-pod-affinity.html) +defined by the installer. + +The standard affinity rule is to distribute the pods of the same component to +different OpenShift (OCP) hosts whenever possible. If at a given time there are +more pods that need to be running than nodes then some of the nodes will be +running multiple pods for the same component. + +Each OpenStack has its own restrictions and recommendations for their individual +components, so please refer to their respective configuration sections to find +out specific recommendations and restrictions. + +## 4. Setting service configuration options + +The installation configuration for OpenStack on OpenShift is defined in an +`OpenStackControlPlane` manifest that has general sections as well as +per-service specific sections. Each storage service has its own section in the +`OpenStackControlPlane` spec for its configuration. The naming will match the +service codename `cinder`, `glance`, `manila`, and `swift`: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + + glance: + + manila: + + swfit: +``` + +This guide assumes that the contents of this manifest exists in a file called +`openstack_control_plane.yaml`, and that we have either deployed it already in +our OpenShift cluster or we are editing it and creating its content, so when the +guide asks to make sure something is defined, it is saying that it should be +present in that file. It is possible to edit the contents directly on the +OpenShift cluster using `oc edit`, but it is not recommended, and using a file +and then applying it is preferred. + +--- + +> **⚠ Attention:** Any change applied to a service configuration will +immediately trigger a restart of the service to use the new configuration. + +--- + +The most important thing to understand when configuring storage services is the +concept of a *snippet*. A snippet is a fragment of a service configuration file +that has meaning in itself and could be a valid configuration file on its own +and is used “as is” to construct the final service configuration provided to the +service. + +The following example is not a valid *snippet* because it doesn’t have a +section, so it wouldn’t be a valid service configuration file on its own. + +``` +db_max_retries = 45 +``` + +This other *snippet* would be valid, as it has a section: + +``` +[database] +db_max_retries = 45 +``` + +Except in very few occasions, there is no interpretation of the contents of a +*snippet*, so a specific configuration option will not trigger a series of +configuration changes elsewhere, in the same service or in other OpenStack +service, to make things consistent or reasonable. It is the system +administrator’s responsibility to configure things correctly. If a feature +requires changes in multiple OpenStack services to work, then the system +administrator will have to make the appropriate changes in the configuration +sections of the different services. + +This has the downside of being a bit manual, but it has the benefits of reducing +the installer specific knowledge needed and leverage the OpenShift and OpenStack +service specific knowledge, while giving full control and transparency over all +existing configuration options. + +To this effect OpenStack on OpenShift services come with sensible defaults and +only need *snippets* for the specific deployment configurations, such as the +Block Storage service backend configuration. + +There are two types of snippets based on their privacy: Public and sensitive, +and services can have one or both of these, depending on our needs and +preferences. + +### Public *snippets*: + +These are *Snippets* that contain basic service configuration, for example the +value of the `debug` configuration option on a service. This information can be +present directly as plain text in the manifest file as they have no sensitive +information. They are set in the `customServiceConfig` of a service section +within the `OpenStackControlPlane` spec. + +Here is an extract of an `OpenStackControlPlane` file to show a *snippet* that +enables debug level logs in the cinder scheduler: + +``` +---- +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + cinderScheduler: + customServiceConfig: | + [DEFAULT] + debug = True +``` + +Some of the services have a top level `customServiceConfig` to facilitate +setting configuration options, like the above mentioned `debug` mode, in all its +services. So for example enabling debug log levels in all the Block Storage +services can be done like this: + +``` +---- +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + customServiceConfig: | + [DEFAULT] + debug = True + +``` + +### Sensitive *snippets*: + +These are *Snippets* that contain sensitive configuration options that, if +exposed, pose a security risk. For example the credentials, username and +password, to access the management interface of a Block Storage service +back-end. + +In OpenShift it is not recommended to store sensitive information in the CRs +themselves, so most OpenStack operators have a mechanism to use OpenShift +`Secrets` for sensitive configuration parameters of the services and then use +them by reference in the `customServiceConfigSecrets` section of the service. +This `customServiceConfigSecrets` is similar to the `customServiceConfig` except +it doesn’t have the *snippet* itself but a reference to secrets with the +*snippets*. + +Another difference is that there is no global `customServiceConfigSecrets` +section for the whole service like there was for the `customServiceConfig`. + +Let’s see an example of how we could use a secret to set an NFS configuration +options in cinder, first creating a `Secret` with the sensitive *snippet*: + +``` +apiVersion: v1 +kind: Secret +metadata: + labels: + service: cinder + component: cinder-volume + name: cinder-volume-nfs-secrets +type: Opaque +stringData: + cinder-volume-nfs-secrets: | + [nfs] + nas_host=192.168.130.1 + nas_share_path=/var/nfs/cinder +``` + +Then using this secret named `cinder-volume-nfs-secrets` in the `cinder` +section: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + cinderVolumes: + nfs: + customServiceConfigSecrets: + - cinder-volume-nfs-secrets + customServiceConfig: | + [nfs] + volume_backend_name=nfs + volume_driver=cinder.volume.drivers.nfs.NfsDriver +``` + +--- + +> **⚠ Attention:** Remember that all snippets must be valid, so the +configuration section (`[nfs]` in the above example) must be present in all of +them. + +--- + +It is possible, and common, to combine multiple `customServiceConfig` and +`customServiceConfigSecrets` to build the configuration for a specific service. +For example having a global *snippet* in the top level `customServiceConfig` +with the `debug` level, a *snippet* with generic back-end configuration options +in the service’s `customServiceConfig` and a `customServiceConfigSecrets` with +the sensitive information. See this example that combines above examples: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + customServiceConfig: | + [DEFAULT] + debug = True + cinderVolumes: + nfs: + customServiceConfigSecrets: + - cinder-volume-nfs-secrets + customServiceConfig: | + [nfs] + volume_backend_name=nfs + volume_driver=cinder.volume.drivers.nfs.NfsDriver +``` + +When there is sensitive information in the service configuration then it becomes +a matter of personal preference whether to store all the configuration in the +`Secret` or only the sensitive parts. + +--- + +> **⚠ Attention:** There is no validation of the snippets used during the +installation, they are passed to the service as-is, so any configuration option +in a snippet added outside a section could have unexpected consequences. + +--- + +## 5 Restricting where services run + +In OpenStack on OpenShift deployments, whether it’s a 3 nodes master/worker +deployment or a 3 masters and 3 workers deployment, the installer will run the +services in any worker node available based on some affinity rules to prevent +having multiple identical pods running on the same node. + +Storage services are very diverse, and so are their individual components, to +show this we can look at the Block Storage service (cinder) where one can +clearly see different requirements for its individual components: the cinder +scheduler is a very light service with low memory, disk, network, and CPU usage; +cinder api has a higher network usage due to resource listing requests; the +cinder volume will have a high disk and network usage since many of its +operations are in the data path (offline volume migration, create volume from +image, etc.), and then we have the cinder backup which has high memory, network, +and CPU (to compress data) requirements. + +Given these requirements it may be preferable not to let these services wander +all over your OCP worker nodes with the possibility of impacting other services, +or maybe you don’t mind the light services wandering around but you want to pin +down the heavy ones to a set of infrastructure nodes. + +There are also hardware restrictions to take into consideration, because when +using a Fibre Channel (FC) Block Storage service back-end then the cinder +volume, cinder backup, and maybe even the Image Service (glance) (if it’s using +the Block Storage service as a backend) services need to run on an OpenShift +host that has an HBA card. If all the OpenShift worker nodes meet the hardware +requirements of the services (e.g. have HBA cards), then there’s no problem to +let the installer freely place the pods, but if only a subset of the nodes meet +the requirements, we need to indicate this restriction to the installer. + +The OpenStack on OpenShift installer allows a great deal of flexibility on where +to run the OSP services, as it leverages existing OpenShift functionality. +Using node labels to identify OpenShift nodes that are eligible to run the +different services and then using those labels in the `nodeSelector` field. + +The `nodeSelector` field follows the standard OpenShift `nodeSelector` field +behaving in the exact same way. For more information, see [About node +selectors](https://docs.openshift.com/container-platform/4.16/nodes/scheduling/nodes-scheduler-node-selectors.html) +in OpenShift Container Platform Documentation. + +This field is present at all the different levels of the deployment manifest: + +* Deployment: At the `OpenStackControlPlane` level. + +* Service: For example the `cinder` component section within the + `OpenStackControlPlane`. + +* Component: For example the cinder backup (`cinderBackup`) in the `cinder` + section. + +Values of the `nodeSelector` are propagated to the next levels unless they are +overwritten. This means that a `nodeSelector` value at the deployment level will +affect all the OpenStack services. + +This allows a fine grained control of the placement of the OpenStack services +with minimal repetition. + +--- + +> **🛈 Note:** The Block Storage service does not currently have the possibility +of defining the `nodeSelector` globally for all the cinder volumes (inside the +`cinderVolumes` section), so it is necessary to specify it on each of the +individual back-ends. + +--- + +Labels are used to filter eligible nodes with the `nodeSelector`. Existing node +labels can be used or new labels can be added to selected nodes. + +It is possible to leverage labels added by the Node Feature Discovery (NFD) +Operator to place OSP services. For more information, see [Node Feature +Discovery +Operator](https://docs.openshift.com/container-platform/4.16/hardware_enablement/psap-node-feature-discovery-operator.html) +in OpenShift Container Platform Documentation. + +### Example \#1: Deployment level + +In this scenario we have an OpenShift deployment that has more than three worker +nodes, but we want all the OpenStack services to run on three of these nodes: +`worker-0`, `worker-1`, and `worker-2`. + +We’ll use a new label named `type` and we’ll set its value to `openstack` to +mark the nodes that should have the OpenStack services. + +First we add this new label to the three selected nodes: + +``` +$ oc label nodes worker0 type=openstack +$ oc label nodes worker1 type=openstack +$ oc label nodes worker2 type=openstack +``` + +And then we’ll use this label in our `OpenStackControlPlane` in the top level +`nodeSelector` field: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + nodeSelector: + type: openstack +``` + +### Example \#2: Component level + +In this scenario we have an OpenShift deployment that has 3 worker nodes, but +only two of those (`worker0` and `worker1`) have an HBA card. We want to +configure the Block Storage service with our FC back-end, so cinder volume must +run in one of the two nodes that have an HBA. + +We’ll use a new label named `rhosotype` and we’ll set it to `storage` to mark +the nodes that have access to the Block Storage data network, which requires the +presence of an HBA card. + +First we add this new label to the two selected nodes: + +``` +$ oc label nodes worker0 rhosotype=storage +$ oc label nodes worker1 rhosotype=storage +``` + +Now we would ensure that the `OpenStackControlPlane` is configured to select +those nodes for the cinder volume back-end level: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + cinderVolumes: + fibre_channel: + nodeSelector: + fc_card: true +``` + +## 6 Using external data in the services + +There are many scenarios in real life deployments where access to external data +is necessary, for example: + +* A component needs deployment specific configuration and credential files for + the storage back-end to exist in a specific location in the filesystem. The + Ceph cluster configuration and keyring files is one such case, and those files + are needed in the Block Storage service (cinder volume and cinder backup), + Image service, and Compute nodes. So it’s necessary for a system administrator + to provide arbitrary files to the pods. + +* It’s very difficult to estimate the disk space needs of the nodes since we + can’t anticipate the number of parallel operations or size needed, so a node + can run out of disk space because of the local image conversion that happens + on the “create volume from image” operation. For those scenarios being able to + access an external NFS share and use it for the temporary image location is an + appealing alternative. + +* Some back-end drivers expect to run on a persistent filesystem, as they need + to preserve data they store during runtime between reboots. + +To resolve these and other scenarios OpenStack on OpenShift has the concept of +`extraMounts` that has the ability of using all the [types of volumes available +in OpenShift](https://kubernetes.io/docs/concepts/storage/volumes/): +`ConfigMaps`, `Secrets`, `NFS` shares, etc. + +These `extraMounts` can be defined at different levels: Deployment, Service, and +Component. For the higher levels we can define where these should be applied by +using the propagation functionality, otherwise they will propagate to all +available resources below; so at the deployment level they would go to all the +services (cinder, glance, manila, neutron, horizon…), at the service level they +would go to all the individual components (e.g.: cinder api, cinder scheduler, +cinder volume, cinder backup), and at the component level it can only go to the +component itself. + +This the the general structure of the `extraMounts` which contains a list of +“mounts”: + +``` +extraMounts: + - name: [1] + region: [2] + extraVol: + - propagation: [3] + - + extraVolType: [4] + volumes: [5] + - + mounts: [6] + - +``` + +- \[1\]: Optional name of the extra mount, not of much significance since it’s not +referenced from other parts of the manifest. + +- \[2\]: Optional name of the OpenStack region where this `extraMount` should be +applied. + +- \[3\]: Location to propagate this specific `extraMount`. + + There are multiple names that can be used each with different granularity: + + * Service names: `Glance`, `Cinder`, `Manila` + * Component names: `CinderAPI`, `CinderScheduler`, `CinderVolume`, + `CinderBackup`, `GlanceAPI`, `ManilaAPI`, `ManilaScheduler`, `ManilaShare` + * Back-end names: For cinder it would be any name in the `cinderVolumes` map. + In the examples we’ve shown before it could have been `fibre_channel` or + `iscsi` or `nfs`. + * Step name: `DBSync` + + **⚠ Attention:** Defining a broad propagation at a high level can result in + the mount happening in many different components. For example using `DBSync` + propagation at the deployment level will propagate to the cinder, glance, and + manila DB sync pods. + +- \[4\] Optional field declaring the type of extra volume. It’s possible values +are `Ceph` and `Undefined`, so in reality it only makes sense to set it when +used to propagate Ceph configuration files. + +- \[5\]: List of volume sources, where volumes refers to the broader term of a +“volume” as understood by OpenShift. + + This has the exact same structure as the `volumes` section in a `Pod` and is + dependent on the type of volume that is being defined, as it is not the same + to reference an NFS location as it is to reference a secret. + + The name used here is very important, as it will be used as a reference in the + `mounts` section. + + An example of the excerpt of a `volumes` element that defines that the volume + is a secret called `ceph-conf-files`: + + ``` + - name: ceph + secret: + name: ceph-conf-files + ``` + +- \[6\]: List of instructions on where the different volumes should be mounted +within the Pods. Here the name of a volume from the `volumes` section is used as +a reference as well as the path where it should be mounted. + + This has the exact same structure as the `volumeMounts` in a `Pod`. + + An example of how we would use the `cephf-conf-files` secret that was named as + `ceph` in the earlier `volumes` example: + + ``` + - name: ceph + mountPath: "/etc/ceph" + readOnly: true + ``` + +Now that we know the `extraMounts` structure and possible values, let’s have a +look at a couple of examples to tie this all together: + +### Example \#1: At the deployment level + +A very good example is the propagation of the Ceph configuration that is stored +in a secret called `ceph-conf-files` to the different storage services (glance, +manila, and cinder components in the data path): + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +spec: + extraMounts: + - name: v1 + region: r1 + extraVol: + - propagation: + - CinderVolume + - CinderBackup + - GlanceAPI + - ManilaShare + extraVolType: Ceph + volumes: + - name: ceph + secret: + name: ceph-conf-files + mounts: + - name: ceph + mountPath: "/etc/ceph" + readOnly: true +``` + +### Example \#2: At the component level + +Let’s add a custom policy to the cinder API component. We have previously +created this policy as a `configMap` with the name `my-cinder-policy` and the +contents are stored in the `policy.yaml` key: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +spec: + cinder: + extraMounts: + - extraVol: + - volumes: + - name: policy + configMap: + name: my-cinder-policy + mounts: + - mountPath: /etc/cinder/policy.yaml + subPath: policy.yaml + name: policy + readOnly: true + propagation: + - CinderAPI +``` + +## 7 Restricting resources used by a service + +Many storage services allow you to specify for each component how much of each +resource its container needs. The most common resources to specify are CPU and +memory (RAM). + +Requested resources information is used by the OpenShift scheduler to decide +which node to place the Pod on, while the *limit* is enforced so that the +container is not allowed to go beyond the set limit. + +A system administrator can do this using the `resources` field which has both +the `requests` and `limits` as [defined in the official +documentation](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) +for the same `requests`. + +As an example (not necessarily a good one) where we are setting limits for the +memory used by the cinder scheduler so that if it goes beyond that the container +gets killed: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + cinderScheduler: + resources: + limits: + memory: "500Mi" +``` + +## 8 Setting API timeouts + +All storage services have a REST API interface to interact with users and other +OpenStack components, and when a connection is opened but no data is transmitted +after a period of time the connection is considered stale, a connection timeout +occurs and the connection is closed. + +There are cases where the default API timeout is insufficient and the timeout +needs to be increased for proper service operation. + +The `cinder`, `glance`, and `manila` components have a field called `apiTimeout` +that accepts an integer with the implicit unit being seconds and a default value +of `60` seconds. + +This value is used to set all the appropriate configuration options in the +different areas to make it effective. In other words, by setting this single +value the installer will set the HAProxy timeout, the Apache timeout, and even +internal RPC timeouts accordingly. + +These three timeouts can still be individually configured using the `apiTimeout` +and `apiOverride` sections of the manifest and the `rpc_response_timeout` +configuration option in the snippet. + +Here’s an example of setting the HAProxy to 121 seconds, the Apache timeout to +120 seconds, and the RPC timeout to 119 seconds for Cinder. + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + apiOverride: + route: + public: + "haproxy.router.openshift.io/timeout": 121 + apiTimeout: 120 + customServiceConfig: | + [DEFAULT] + rpc_response_timeout = 119 +``` + +--- + +> **🛈 Note:** The reason why it is possible to individually configure each +timeout (HAProxy, Apache, and RPC) is that they may return different errors to +REST API clients, so admins may prefer one to the other. + +--- + +## 9 Storage networking + +Storage best practices recommend using two different networks: one for the data +I/O and another for storage management. + +There is no technical impediment, as in it will work, to have a single storage +network or even have no specific storage network and have it all mixed with the +rest of your network traffic, but we recommend deploying OpenStack in OpenShift +with a network architecture that follows as closely as possible the best +practices. + +In this and other guides and their examples these networks will be referred to +as `storage` and `storageMgmt`. If your deployment diverges from the two +networks reference architecture, please adapt the examples as necessary. For +example if the storage system’s management interface is available on the +`storage` network, replace any mention of `storageMgmt` with `storage` when +there’s only one network and remove it altogether when the `storage` network is +already present. + +Most storage services, with the possible exception of the Object Storage +service, require access to the `storage` and `storageMgmt` networks, and they +are configured in the `networkAttachments` field that accepts a list of strings +with all the networks the component should have access to for proper operation. +Different components can have different network requirements, for example Cinder +API component doesn’t need access to any of the storage networks. + +This is an example for the cinder volume: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + cinderVolumes: + iscsi: + networkAttachments: + - storage + - storageMgmt +``` + +## 10 Using other container images + +OpenStack services are deployed using their respective container images for the +specific OpenStack release and version. There are times when a deployment may +require using different container images. The most common cases of using a non +default container image are: deploying a hotfix and using a certified vendor +provided container image. + +The container images used by the installer for the OpenStack services are +controlled via the `OpenStackVersion` CR. An `OpenStackVersion` CR is +automatically created by the openstack operator during the deployment of the +OpenStack services, or we can create it manually before we apply the +`OpenStackControlPlane` but after after the openstack operator has been +installed. + +Using the `OpenStackVersion` we can change the container image for any service +and component individually. The granularity of what can have different images +depends on the service, for example for the Block Storage service (cinder) all +the cinder API pods will have the same image, the same is true for the scheduler +and backup components, but for the volume service the container image is defined +for each of the `cinderVolumes`. + +For example, let’s assume we have the following cinder volume configuration with +two volume back-ends, Ceph and one called custom-fc that requires a certified +vendor provided container image, and we also want to change the other component +images. And excerpt of the `OpenStackControlPlan` could look like: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackControlPlane +metadata: + name: openstack +spec: + cinder: + template: + cinderVolumes: + ceph: + networkAttachments: + - storage +< . . . > + custom-fc: + networkAttachments: + - storage +``` + +--- + +> **⚠ Attention:** The name of the `OpenStackVersion` must match the name of +your `OpenStackControlPlane`, so in your case it may be other than openstack. + +--- + +Then the `OpenStackVersion` that would change the container images would look +something like this:: + +``` +apiVersion: core.openstack.org/v1beta1 +kind: OpenStackVersion +metadata: + name: openstack +spec: + customContainerImages: + cinderAPIImages: + cinderBackupImages: + cinderSchedulerImages: + cinderVolumeImages: + custom-fc: +``` + +In this scenario only the Ceph volume back-end pod would use the default cinder +volume image. diff --git a/docs/user-guide/index.md b/docs/user-guide/index.md new file mode 100644 index 00000000..c84b1e2a --- /dev/null +++ b/docs/user-guide/index.md @@ -0,0 +1,36 @@ + + +# PERSISTENT STORAGE GUIDE + +The OpenStack on OpenShift Operators support multiple persistent storage types: +Volumes, Images, Shares, and Objects. Each of these types is handled by a +different OpenStack service, and each OpenStack services has its own operator. +These operators have some commonalities in their behavior and their data +structures in their CRDs, but due to their specific needs they also have some +differences. + +This guide is intended as a source of contextual information on configuring and +deploying the OpenStack Block Storage service (`cinder`), though it may also +include some procedures. The reader is expected to be familiar with basic +OpenShift or Kubernetes concepts and they won't be covered in this guide. The +reader is also expected to refer to the [user documentation]( +https://openstack-k8s-operators.github.io/openstack-operator/) to be able to +configure and install a full OpenStack deployment. + +--- + +> **🛈 NOTE:** For users familiar with previous OpenStack installer (TripleO), +> it is important to know that the new installation no longer uses TripleO or +> Tripleo-Heat-Templates (THT), it uses a completely new mechanism, so the +> abstraction layer between the installer configuration options and the +> individual services configuration options has been removed. + +--- + +As mentioned before there are some commonalities among persistent storage +services and some differences, and we cover these in different documents. The +recommendation is to go over the commonalities documentation first and then the +specific `cinder` guide. + +- [General storage concepts](commonalities.md) +- [Cinder configuration guide](cinder.md)