-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Declarative Server
Maintenance
#76
Comments
To kick of some discussion here is a short sketch, which more or less is a trimmed down version of the NodeMaintenance proposal. // +enum
type ServerMaintenanceStage string
const (
// Idle announces a maintenance.
Idle ServerMaintenanceStage = "Idle"
// InMaintenance shuts down servers, potentially releasing server claims.
InMaintenance ServerMaintenanceStage = "InMaintenance"
// As long as no other maintenance is applied to a server, they are made ready again.
Complete ServerMaintenanceStage = "Complete"
)
type ServerMaintenance struct {
...
Spec ServerMaintenanceSpec
Status ServerMaintenanceStatus
}
type ServerMaintenanceSpec struct {
// ServerSelector selects servers for this maintenance.
ServerSelector *v1.ServerSelector
// The order of the stages is Idle -> InMaintenance -> Complete.
// The default value is Idle.
Stage ServerMaintenanceStage
// Reason for the maintenance.
Reason string
} As soon as the Stage is set to The cloud-provider-manager-metal can have a watch on |
@Nuckal777 thanks for input! I'm pretty sure, that we must not power off servers in such an implicit manner. For instance, firmware updates, especially which require server reboot, should be also considered as maintenance. Hence, it might not be reasonable to force server power off. Apart from that, any of controllers which work with From my perspective, maintenance state should be represented in
|
There is an issue with that design: When 2 controllers powered off a
Why couldn't a the firmware upgrade handling create a
I agree. This would be a disadvantage. |
I like the idea of having an own resource initiating the |
As @afritzler mentioned, it should be done in such a way, that particular controller may manage server's power state only if server is in particular state. Like if server is in Reserved state, then only ServerClaim controller could power it on/off, if server is in Maintenance state, then only Maintenance controller could power it on/off and etc.
It definitely could, but I was pointing that is not necessarily need to power off the server in maintenance state. |
After reading the node in-place update discussion of Gardener, I had some time to look into the topic again. It was never explicitly mentioned in this ticket, but I think there are to main use cases for the maintenance of a
Am I missing something from the metal-operator perspective? As with the BIOS updates, I think we should focus on managing maintenances on a single So, if I understand both of your correctly, you'd like the declaration of a maintenance to be a mean of moving the permission of power-cycling a An alternative // +enum
type ServerMaintenanceStage string
const (
// Idle announces a maintenance.
Idle ServerMaintenanceStage = "Idle"
// InMaintenance shuts down servers, potentially releasing server claims.
InMaintenance ServerMaintenanceStage = "InMaintenance"
// As long as no other maintenance is applied to a server, they are made ready again.
Complete ServerMaintenanceStage = "Complete"
)
type ServerMaintenance struct {
...
Spec ServerMaintenanceSpec
Status ServerMaintenanceStatus
}
type ServerMaintenanceSpec struct {
// ServerRef selects a server for this maintenance.
ServerRef *v1.ObjectReference
// The order of the stages is Idle -> InMaintenance -> Complete.
// The default value is Idle.
Stage ServerMaintenanceStage
// TargetState specifies the state the server should be moved to
// when the stage changes to InMaintenance
TargetState ServerState
// Reason for the maintenance.
Reason string
} Potentially, a list of maintenances can also be inlined into the An exemplary upgrade flow for a Server could then look like this:
The hardware failure case would be similar, but require a different controller, which just turns the power off and another |
How about adding a maintenance window as well? So one can decide when a bios or firmware upgrade should take place or should that be done on another service? Maybe adding another stage: |
Summary
Describe a concept on how we can put a
Server
intoMaintenance
mode. Also evaluate what it would mean to move it back into an operational state.The Kubernetes community has a KEP extending the
Node
API by a declarative approach in handlingNode
maintenance: kubernetes/enhancements#4212We should see how those concepts can be applied for our
Server
objects.Implications to consider:
How is the maintenance state influences the behaviour of controllers sitting on top of the
metal
API e.g.:Expected outcome:
metal
typesThe text was updated successfully, but these errors were encountered: