-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre v1alpha4 rel #428
Merged
Merged
Pre v1alpha4 rel #428
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add timeout when creating fan-out child resources The ClientMount, NnfNodeStorage, and NnfNodeBlockStorage resources are fanned out to the Rabbit and compute nodes. If the correct controllers aren't running on one or more of those nodes, then the workflow will not progress but won't give an error. Add an optional timeout that checks whether the controller on the Rabbit/compute node has added its finalizer within a configurable amount of time. If the finalizer hasn't been added, then return an error. Signed-off-by: Matt Richerson <[email protected]> * use default child timeout value instead of returning error Signed-off-by: Matt Richerson <[email protected]> --------- Signed-off-by: Matt Richerson <[email protected]>
The workflow controller add/removes owner labels on the PersistentStorageInstance resource in the teardown phase of the create_persistent/destroy_persistent directives. This is so that the later call to DeleteChildren() will find (or not find) the persistent storage and delete it if necessary. The call to DeleteChildren() may do the wrong thing if the PersistentStorageInstance resource in the cache is stale. This commit adds a check after the labels are changed to make sure the changes are visible in our client cache. Also, change the Requeue while waiting for children to delete to a RequeueAfter. Fix a bug in the NnfSystemStorage and NnfAccess tests. The Storage resource are created by the SystemConfiguration controller, so we don't need to create or delete them Signed-off-by: Matt Richerson <[email protected]>
Save the working rules so they can be found quickly some other day. Signed-off-by: Dean Roehrich <[email protected]>
Role rules to monitor API Priority and Fairness
Create v1alpha3 APIs. This used "kubebuilder create api --resource --controller=false" for each API. Signed-off-by: Matt Richerson <[email protected]>
Copy API content from v1alpha2 to v1alpha3. Move the kubebuilder:storageversion marker from v1alpha2 to v1alpha3. Set localSchemeBuilder var in api/v1alpha2/groupversion_info.go to satisfy zz_generated.conversion.go. Signed-off-by: Matt Richerson <[email protected]>
Move the existing webhooks from v1alpha2 to v1alpha3. Signed-off-by: Matt Richerson <[email protected]>
Create conversion webhooks and hub routines for v1alpha3. This may have used "kubebuilder create webhook --conversion" for any API that did not already have a webhook. Any newly-created api/v1alpha3/*_webhook_test.go is empty and does not need content at this time. It has been updated with a comment to explain where conversion tests are located. ACTION: Any new tests added to github/cluster-api/util/conversion/conversion_test.go may need to be manually adjusted. Look for the "ACTION" comments in this file. This may have added a new SetupWebhookWithManager() to suite_test.go, though a later step will complete the changes to that file. Signed-off-by: Matt Richerson <[email protected]>
Create conversion routines and tests for v1alpha2. Switch api/v1alpha2/conversion.go content from hub to spoke. These conversion.go ConvertTo()/ConvertFrom() routines are complete and do not require manual adjustment at this time, because v1alpha2 is currently identical to the new hub v1alpha3. ACTION: The api/v1alpha2/conversion_test.go may need to be manually adjusted for your needs, especially if it has been manually adjusted in earlier spokes. ACTION: Any new tests added to internal/controller/conversion_test.go may need to be manually adjusted. This added api/v1alpha2/doc.go to hold the k8s:conversion-gen marker that points to the new hub. Signed-off-by: Matt Richerson <[email protected]>
Point controllers at new hub v1alpha3 Point conversion fuzz test at new hub. These routines are still valid for the new hub because it is currently identical to the previous hub. ACTION: Some controllers may have been referencing one of these non-local APIs. Verify that these APIs are being referenced by their correct versions: DirectiveBreakdown, Workflow Signed-off-by: Matt Richerson <[email protected]>
Point earlier spoke APIs at new hub v1alpha3. The conversion_test.go and the ConvertTo()/ConvertFrom() routines in conversion.go are still valid for the new hub because it is currently identical to the previous hub. Update the k8s:conversion-gen marker in doc.go to point to the new hub. ACTION: Some API libraries may have been referencing one of these non-local APIs. Verify that these APIs are being referenced by their correct versions: DirectiveBreakdown, Workflow Signed-off-by: Matt Richerson <[email protected]>
Make the auto-generated files. Update the SRC_DIRS spoke list in the Makefile. make manifests & make generate & make generate-go-conversions make fmt ACTION: If any of the code in this repo was referencing non-local APIs, the references to them may have been inadvertently modified. Verify that any non-local APIs are being referenced by their correct versions. ACTION: Begin by running "make vet". Repair any issues that it finds. Then run "make test" and continue repairing issues until the tests pass. Signed-off-by: Matt Richerson <[email protected]>
Api v1alpha3
Signed-off-by: Dean Roehrich <[email protected]>
Mark the v1alpha1 API as unserved.
A rabbit that has lost its NoSchedule taint, but retains its nnf.cray.hpe.com/taints_and_labels_completed=true label, was not able to repair its taints. This change allows the nnf_systemconfiguration_controller to examine the node and determine whether the label is stale with respect to the state of the taints, and to correct the taints if necessary. Signed-off-by: Dean Roehrich <[email protected]>
Add two new fields to the NnfStorageProfile: postActivate and preDeactive. These are free form string lists that allow an admin to list commands to run on the Rabbit after a file system has been activated or before it is deactivated. Signed-off-by: Matt Richerson <[email protected]>
* Use a file based database for nnf-ec Mount /localdisk (the M.2) into the nnf-node-manager pods. Use the default database in nnf-ec (badger) and change the working directory of the container to /localdisk so the database file is created in the correct spot. Signed-off-by: Matt Richerson <[email protected]> * add type field to localdisk volumes Signed-off-by: Matt Richerson <[email protected]> --------- Signed-off-by: Matt Richerson <[email protected]>
User jobs currently do not have a way to retrieve the Servers resource for a workflow. Access to the servers resource can provide lustre information, such as which rabbit nodes are being used for MDT/OSTs. This creates a file (`./.nnf-servers.json`) at the root of the lustre filesystem that contains MDTs/OSTs. It can then be parsed using `jq` to retrieve the pertinent information. Examples: ``` # non-persistent flux run -N4 --setattr=dw="#DW jobdw name=blake type=lustre capacity=30GB" bash -c "cat \$DW_JOB_blake/.nnf-servers.json | jq '.ost'" # persistent flux run -N4 --setattr=dw="#DW persistentdw name=blake-persistent" bash -c "cat \$DW_PERSISTENT_blake_persistent/.nnf-servers.json | jq '.ost'" ``` Signed-off-by: Blake Devcich <[email protected]>
Signed-off-by: Dean Roehrich <[email protected]>
Signed-off-by: Dean Roehrich <[email protected]>
Signed-off-by: Dean Roehrich <[email protected]>
Do not print the always-nil 'err' value when we timeout while waiting for a VG to appear. Signed-off-by: Dean Roehrich <[email protected]>
…urrent spec (#410) For NnfStorage and NnfAccess resources created by the NnfSystemStorage, the spec section may change as Storage resources are disabled/enabled. When aggregating status from child objects (NnfNodeBlockStorage, NnfNodeStorage, and ClientMounts), only check the status from child resources that are currently requested by the spec. This avoids trying to collect status from Rabbits that are disabled. Signed-off-by: Matt Richerson <[email protected]>
Signed-off-by: Dean Roehrich <[email protected]>
Signed-off-by: Dean Roehrich <[email protected]>
bdevcich
approved these changes
Dec 9, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.