-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ironic Standalone Operator #461
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
27ec8ca
to
04f201c
Compare
04f201c
to
23135d5
Compare
This design proposal discusses ironic-standalone-operator: provides a motivation for its creation, describes its goals and outlines the current design and features, as well as the plans for the nearest future. Signed-off-by: Dmitry Tantsur <[email protected]>
23135d5
to
7be12d1
Compare
|
||
## Motivation | ||
|
||
Ironic is not trivial to install and operate. Also we provide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Even if we provide" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also-vs-Although 🤦🏽 Will fix if I need to update.
- `credentialsRef` - a reference to a secret with credentials (generated if | ||
missing) | ||
- `databaseRef` - a reference to an `IronicDatabase` object (if needed) | ||
- `distributed` - a boolean flag that enables the HA architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just call this field highlyAvailable
or highAvailability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. I guess distributed
is more precise technically, while highAvailability
may be more user-friendly. I'd like to hear more opinions on the topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote for highAvailability or haEnabled
|
||
- `apiPort`, `imageServerPort`, `imageServerTLSPort` allow overriding listening | ||
ports for the services | ||
- `bindInterface` - a boolean flag that makes Ironic listen on only the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may sound too simplistic, but can we call this provisioningInterfaceOnly? )bindInterface can sound like another kind of interface.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur but see below about the word "provisioning".
- `dhcp` - another nested structure with DHCP parameters (see below) | ||
- `externalIP` - IP through which nodes deployed over virtual media access | ||
Ironic and HTTPD | ||
- `interface`, `ipAddress`, `macAddresses` - various ways to specify the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, how does provisioningInterface, provisioningIPAddress and provisioningMACAddress sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential for confusion here. We call "provisioning network" a dedicated network for the PXE and other provisioning traffic. Ironic can work without it, and the variables work regardless of whether you have a "provisioning network".
Maybe I'm overthinking it? Would like more opinions here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do decide to change this, I'd rather rename the whole "networking" sub-field into something like "provisioningNetwork" to avoid duplicate prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the rest of the sub-fields seem unrelated to the provisioning network (like dhcp and external ip)?
How would the interface, ipAddress and macAddresses fields work if there is no provisioning interface?
(I am not too familiar with ironic's networking, so just trying to understand the expected behaviour)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DHCP is very-very much related to the provisioning network: it's DHCP on the provisioning network. ExternalIP is not directly related indeed, maybe we'd need to move it out then.
How would the interface, ipAddress and macAddresses fields work if there is no provisioning interface?
There must be a provisioning interface, but it can be on some existing network rather than an isolated provisioning network. Confusing, I know. This is why I'm pondering our usage of "provisioning" here.
|
||
After each full reconciliation, the operator will store the version of Ironic | ||
it has just installed in the `status`. In the future, this will allow to apply | ||
any logic on upgrade. However, we will try to keep ironic-image self-upgrading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'However, we will try to keep ironic-image self-upgrading for the sake of users that do not use ironic-standalone-operator.' What do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, if you just use ironic-image, and you upgrade it by replacing the container, we will try to support this case still. Any ideas on how to phrase it better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean that for example, if someone was running ironic-standalone-operator v1.0 (with ironic-image v1.0), and just replaced their ironic-image with v1.1, ironic-standalone-operator will continue to work?
If that's what you mean, then 'for the sake of users that do not use ironic-standalone-operator.' can be confusing..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see the source of confusion. What I wanted to say was: users of ironic-image without the operator will be able to do it still. I'll work on a different way to phrase it.
avoid running it 3+ times in parallel. Maybe it will take a form of a *Job*. | ||
- BMO will need to be updated to handle more cases of unexpected provision | ||
state. E.g. what needs to be done when the BMH is *inspecting* but the node | ||
is found in a completely wrong state like `active` or `clean wait`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an additional complexity to consider if we're going to support mariadb combined with persistent volumes, when the mariadb version changes you either have to ensure clean shutdown of the container or enable some kind of backup/restore workflow due to the mariadb checks around versioning for the redo log - see here for example.
This can likely be resolved with with pod lifecycle hooks, or perhaps some kind of job pre/post upgrade but it's additional complexity which I don't think we currently consider in the community deployment examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point, thank you!
|
||
Q: Why using *DaemonSets* if *StatefulSets* provide us an easier way to address | ||
separate Ironic instances? A: While we're relying on host networking, making | ||
several Ironic instances co-exist on the same Kubernetes node is too complex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Kubernetes HA perspective it is also more reliable to have the ironic pods present on separate control plane nodes. In other words clustering all Ironic pods on e.g. 1 controlplane node wouldn't be HA anyways as that node could be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
From my perspective it is fine given that you will address the comments.
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @dtantsur few questions/nits inline
#### HA architecture | ||
|
||
The *non-HA* architecture is the architecture that Metal3 uses now. All Ironic | ||
components are run in a *deployment*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
components are run in a *deployment*. | |
components are run in a single *deployment*. |
|
||
When the HA architecture is enabled via a flag on the `Ironic` resource, all | ||
Ironic components (except for MariaDB and dnsmasq) will be installed in a | ||
*DaemonSet* instead of a *Deployment*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we explain the reasoning here ?
|
||
## Design Details | ||
|
||
This proposal adds a new project under the Metal3 umbrella: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it a project or a repository?
its private key certificate and sign the certificate with this CA. The CA will | ||
be trusted **only** for the RPC purpose, removing the possibility of abuse. | ||
To reduce the number of code paths, this process will happen unconditionally, | ||
even when TLS for Ironic itself is not enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding what prevents us from enabling TLS unconditionally for ironic itself?
- `credentialsRef` - a reference to a secret with credentials (generated if | ||
missing) | ||
- `databaseRef` - a reference to an `IronicDatabase` object (if needed) | ||
- `distributed` - a boolean flag that enables the HA architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote for highAvailability or haEnabled
I can approve it already and hold can be taken back when the reviews are addressed |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kashifest, Rozzii The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This design proposal discusses ironic-standalone-operator: provides a
motivation for its creation, describes its goals and outlines the
current design and features, as well as the plans for the nearest
future.
Signed-off-by: Dmitry Tantsur [email protected]