Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: BIOS version & settings management #138

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

aobort
Copy link
Contributor

@aobort aobort commented Oct 3, 2024

Proposed Changes

This PR contains PoC implementation for management of server's BIOS version and settings:

  • API types
  • Controller
  • Service managing tasks (scan, apply settings, update version)

This approach fully separate concrete job implementation from the reconciliation flow.

To discuss

BootOrder

Do we need to move boot order field from server to serverBIOS CR? Semantically, yes. As it's one of BIOS settings.

BIOS settings in .status

From my perspective, it reasonable to reflect in .status.bios.settings only those settings which are set in .spec.bios.settings. This will make comparison of .spec and .status much easier.

Storing bios settings

Do we need custom type for bios setting:

type SettingsEntry struct {
    // +required
    Name string `json:"name"`
    
    // +required
    Value string `json:"value"`
        
    // +optional
    Unsupported bool `json:"unsupported,omitempty"`
}

to avoid attempting to apply settings which are not supported in the specified BIOS version? I.e.:

apiVersion: metal.ironcore.dev/v1alpha1
kind: ServerBIOS
metadata:
  name: foo
spec:
  scanPeriodMinutes: 30
  serverRef:
    name: bar
  bios:
    # changing both version and 'unsupported' flag in the same time
    version: 1.0.0  --> 2.0.0
    settings:
    - name: legacyboot
      value: enabled
      unsupported: false  --> true

upgrading from version 1.0.0 to 2.0.0 will lead to the bios setting responsible for legacy boot is deprecated/unsupported, thus it is explicitly marked as unsupported and will not be considered during settings applying.

@aobort aobort force-pushed the poc-firmware-controllers branch 4 times, most recently from 0aea100 to 958928c Compare November 4, 2024 10:52
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 7, 2024
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
@aobort aobort linked an issue Nov 8, 2024 that may be closed by this pull request
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
func (s *ServerHTTP) registerRoutes() {
s.mux.HandleFunc("/scan", s.scanHandler)
s.mux.HandleFunc("/settings-apply", s.settingsApplyHandler)
s.mux.HandleFunc("/version-update", s.versionUpdateHandler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: would be great to have an endpoint to get all current active tasks from the sync.Map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanhipfel thanks for idea.
Totally agree. Also we might need to implement queues for requests, rate limiting, retrying on client side and tons of other stuff. Apart from that, I'd also like to generate http API from spec without hardcoding endpoints.

But for now, I think the main goal is to come to agreement - will we proceed further with this design or not.

// if referred server is not in Available state - stop reconciliation
if server.Status.State != metalv1alpha1.ServerStateAvailable {
return ctrl.Result{RequeueAfter: r.RequeueInterval}, nil
}
Copy link
Contributor

@stefanhipfel stefanhipfel Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to stop the server reconciliation in the meantime?
e.g.: do not allow serverClaim

Copy link
Contributor Author

@aobort aobort Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we do not need to stop server's reconciliation - we do not care what is the state of the server. But we need to stop serverBIOS reconciliation, bc both BIOS version and settings update will lead to server reboot. So for now we decided that we'll work only with servers which are in "Available" state.

If the server is in available state, then we'll need to set it to "maintenance". But the servers' maintenance topic is still open. Thus I decide to not to mention it at all for now.

#76

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but during a serverBios update, someone could still claim the server. Any available server can be claimed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but during a serverBios update, someone could still claim the server. Any available server can be claimed

@stefanhipfel that's a good point. However, I do not really like the idea that the serverBIOS controller would change server's state to exclude it from reconciliation. For number of reasons the server's "owner" might want to postpone the update. Especially in terms of version upgrade.
I could add the "Maintenance" state for server and update the controller, so it will check whether server's state is "Maintenance" instead of "Available".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aobort yes i think so as well, that the serverBIOS should not change a server's state.

Alternative idea: the server reconciler checks on the serverBIOS state.

@stefanhipfel
Copy link
Contributor

overall I think the POC looks ok

// if referred server is not in Available state - stop reconciliation
if server.Status.State != metalv1alpha1.ServerStateAvailable {
return ctrl.Result{RequeueAfter: r.RequeueInterval}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but during a serverBios update, someone could still claim the server. Any available server can be claimed

type ServerBIOSStatus struct {
// BIOS contains a bios version and settings.
// +optional
BIOS BIOSSettings `json:"bios,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add the current state of the bios task

@stefanhipfel
Copy link
Contributor

@aobort overall I think the POC looks good.

Next step would be to gather some improvements and test it with real examples!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change documentation Improvements or additions to documentation size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Draft] BIOS/Firmware Update
2 participants