Release v0.0.2 (#186)

* Debug build for NVM Format command Signed-off-by: Nate Thornton <[email protected]> * Upgrade nnf-ec to latest (e4ba0b) Signed-off-by: Nate Thornton <[email protected]> * Upgrade nnf-ec to latest (96d6a3) Signed-off-by: Nate Thornton <[email protected]> * Upgrade nnf-ec to latest (83d47b) Signed-off-by: Nate Thornton <[email protected]> * Use DWS variable for storage label (#114) Signed-off-by: Dean Roehrich <[email protected]> * Use the DWS workflowname vars for label names (#115) Signed-off-by: Dean Roehrich <[email protected]> * RABSW-1069: Support for refactored DWS Storage resource and NNF Fencing functionality (#113) Support for NNF Node Fencing with DWS Storage interaction Signed-off-by: Nate Thornton <[email protected]> * Add known controller-manager secret Signed-off-by: Nate Thornton <[email protected]> * Disable EC Data Controller for unit tests Signed-off-by: Nate Thornton <[email protected]> * Ignore not found resource on undeploy Signed-off-by: Nate Thornton <[email protected]> * RABSW-1081: Support multiple MDTs (#117) * RABSW-1081: Support multiple MDTs - Update the DirectiveBreakdown to ask for more than one MDT if necessary - Only use a combined MGT/MDT for the first allocation listed in the mgtmdt allocation set. All other allocations will only be MDTs. - Fix an accounting error in the Servers resource where the allocated capacity was not summed across multiple NnfNodeStorages on the same Rabbit. Signed-off-by: Matt Richerson <[email protected]> Signed-off-by: Matt Richerson <[email protected]> * Re-vendor Signed-off-by: Matt Richerson <[email protected]> Signed-off-by: Matt Richerson <[email protected]> Signed-off-by: Matt Richerson <[email protected]> Co-authored-by: Matt Richerson <[email protected]> * RABSW-1099: Vendor DWS (#122) Pick up the changes to the PersistentStorage state fields. Signed-off-by: Matt Richerson <[email protected]> Signed-off-by: Matt Richerson <[email protected]> * Refactor Job/Persistent directive references to use "DW" prefix Signed-off-by: Nate Thornton <[email protected]> * RABSW-1097: Pass UserID and GroupID to ClientMount (#129) * RABSW-1097: Pass UserID and GroupID to ClientMount Pass the UserID and GroupID from the workflow, through the NnfAccess, and to the ClientMount. This is used to set the owner/group of Raw devices on the compute node. Signed-off-by: Matt Richerson <[email protected]> * re-vendor Signed-off-by: Matt Richerson <[email protected]> Signed-off-by: Matt Richerson <[email protected]> * RABSW-1122: Don't allow staging to Raw allocations (#131) Do some more sanity checks on staging directives: - Don't allow staging to/from raw allocations - Match allocation directives based on name and command "jobdw/persistentdw" since names can collide between the two types. Signed-off-by: Matt Richerson <[email protected]> Signed-off-by: Matt Richerson <[email protected]> * RABSW-1097: Use "raw" instead of "lvm" for Raw allocation FsType (#132) * RABSW-1097: Use "raw" instead of "lvm" for Raw allocation FsType nnf-ec now understands the "raw" file system type. * re-vendor Signed-off-by: Matt Richerson <[email protected]> Signed-off-by: Matt Richerson <[email protected]> * Allow builds on all branches. (#134) Loosen the branch filter for pushes. Print some event context, for future debugging. Remove an unused variable from verify_tag. Rename some jobs to give them unique names, to help with debugging. Signed-off-by: Dean Roehrich <[email protected]> * RABSW-1124: Change NnfAccess TeardownState for servers (#136) * RABSW-1124: Change NnfAccess TeardownState for servers The data movement code was mounting and unmounting the Rabbit nodes during the DataIn and DataOut phases of the workflow. A stale workflow resource in the client cache could cause the NnfAccess to be re-mounted after it had already been unmounted. This commit changes the NnfAccess Teardown state logic to do the unmounts in PreRun and Teardown instead of DataIn and DataOut. Signed-off-by: Matt Richerson <[email protected]> * review comments Signed-off-by: Matt Richerson <[email protected]> --------- Signed-off-by: Matt Richerson <[email protected]> * Update to latest nnf-ec with support for additional LVM commands (#130) Signed-off-by: Nate Thornton <[email protected]> * Fix for 'lockStart' typo Signed-off-by: Nate Thornton <[email protected]> * Added PR builds for feature branches (#138) Signed-off-by: Blake Devcich <[email protected]> * RABSW-1128: Make fake mounts on kind Rabbit nodes (#139) * RABSW-1128: Make fake mounts on kind Rabbit nodes Create empty directories on the Rabbit nodes in the clientmount reconciler to better fake out data movement and user containers. Signed-off-by: Matt Richerson <[email protected]> * review comments Signed-off-by: Matt Richerson <[email protected]> --------- Signed-off-by: Matt Richerson <[email protected]> * Shorten LVM names (#141) * RABSW-1129: Shorten LVM names The LV and VG names were too long and caused an error during the lvcreate. This commit changes the VG name to use a truncated version of the file share ID which includes the workflow name/namespace, directive index, and allocation index. This string is combined with the UUID of the workflow. The LV name was changed to be "lv" for all logical volumes since there is only ever the single LV in each VG. Signed-off-by: Matt Richerson <[email protected]> * review comments Signed-off-by: Matt Richerson <[email protected]> --------- Signed-off-by: Matt Richerson <[email protected]> * Ensure the NNF Node resource fencing status is cleared prior updating… (#126) * Ensure the NNF Node resource fencing status is cleared prior to updating the Storage resource * Refactor to use DWS Storage Controller Signed-off-by: Nate Thornton <[email protected]> * Remove finalizer on the new DWS Storage Controller (#144) * Remove finalizer on the new DWS Storage Controller Signed-off-by: Nate Thornton <[email protected]> * RABSW-1139: Fix ClientMount directory create/remove for kind environment (#145) * RABSW-1139: Fix ClientMount directory create/remove for kind environment In the ClientMount controller for kind nodes, check whether the directory exists before creating or removing it. Re-vendor dws Signed-off-by: Matt Richerson <[email protected]> * MkdirAll() already handles when the directory exists. Don't check before hand with a Stat() call. Signed-off-by: Matt Richerson <[email protected]> * re-vendor Signed-off-by: Matt Richerson <[email protected]> --------- Signed-off-by: Matt Richerson <[email protected]> * Ensure 'key' values in the ruleset match against the exact string Signed-off-by: Nate Thornton <[email protected]> * Handle Retryable EC errors for File Share (#133) * Handle Retryable EC errors Signed-off-by: Nate Thornton <[email protected]> * Drive Slot information for drives which are offline (#152) * Upgrade nnf-ec to latest (ca4975) * Pull in drive Slot information from storage resource * update go.sum after running 'go mod tidy' --------- Signed-off-by: Nate Thornton <[email protected]> * add --ignore-not-found to uninstall Signed-off-by: Nate Thornton <[email protected]> * github-151: Fix LVM issues with gfs2 (#157) * github-151: Fix LVM issues with gfs2 This commit fixes two issues that were affecting gfs2 file systems: - The dlm lock manager was failing to lock because the VG name was too long - The lvcreate command needs an "--activate ys" to active a shared volume Signed-off-by: Matt Richerson <[email protected]> * use --extents Signed-off-by: Matt Richerson <[email protected]> --------- Signed-off-by: Matt Richerson <[email protected]> * Update nnf-ec to 1dce5b Signed-off-by: Nate Thornton <[email protected]> * Add container support to workflows (#159) Note: This is an experimental feature, but is working end-to-end (i.e. Proposal to Teardown) via the workflow process. This feature adds experimental support for NNF Containers in workflows. Containers workflows are created by using the `#DW container` directive. An `NnfContainerProfile` must be supplied to the directive to instruct the workflow on what containers to create and which volumes to mount inside of the container. Look at the sample container profile in the `config/samples` directory for more information. The `config/examples` directory are deployed with examples profiles on the system, but do not contain the full documentation. The computes resource must also be updated to instruct the workflow on where to place the container pods. The provided compute nodes will be traced back to their local rabbit node, which will be used as the targets for the pods. Containers are created during `PreRun` through the use of Kubernetes Jobs. Each rabbit node will be the target of one kubernetes Job, which will manage the successful completion of the container. `PreRun` will progress to `ready:true` when the pods have started successfully. Each container has volumes mounted inside of it that are defined by the container profile. The mount paths for these volumes are exposed to the container via environment variables that match the storage names provided by the container directive's arguments (e.g. DW_JOB_foo-local-storage). These storages can be considered optional or not. If not, and the storage argument isn't supplied to the directive, the workflow will fail in the `Proposal` state. Once the workflow has progressed to `PostRun`, the workflow will start to check if the pods have finished. Once finished, `PostRun` will progress to `ready:true` if all pods (i.e. k8s jobs) have completed successfully. If not, `PostRun` will remain in ready:false. Example container directive: ``` #DW jobdw name=my-gfs2 type=gfs2 capacity=50GB #DW persistentdw name=my-persistent #DW container name=my-container profile=example-randomly-fail DW_JOB_foo-local-storage=my-gfs2 DW_PERSISTENT_foo-persistent-storage=my-persistent ``` --------- Signed-off-by: Blake Devcich <[email protected]> Signed-off-by: Nate Thornton <[email protected]> Co-authored-by: Nate Thornton <[email protected]> * Make sure example container profiles contain retryLimit Signed-off-by: Blake Devcich <[email protected]> * NNF Port Manager (#163) NNF Port Manager infrastructure and tests --------- Signed-off-by: Nate Thornton <[email protected]> * Nnf ec enhanced logging (#166) * NNF-EC logger * upgrade to nnf-ec master (47eb7a) * expose zap options --------- Signed-off-by: Nate Thornton <[email protected]> * Containers: Add non-root support This uses SecurityContext and inherits the Workflow's user/group ID. Signed-off-by: Blake Devcich <[email protected]> * Containers: Check for XFS/Raw filesystems (#167) These filesystems can only be mounted once - they are not supported for containers. Signed-off-by: Blake Devcich <[email protected]> * Vendor latest nnf-ec to fix namespace attach failures (#170) Signed-off-by: Anthony Floeder <[email protected]> * RABSW-1150: Add ServiceAccount for NNF fencing agent (#171) Create a ServiceAccount for the NNF fencing agent that allows read and write access to Node and NnfNode resources. Signed-off-by: Matt Richerson <[email protected]> * Containers: Fix Error Output A few situations where being reported as errors when they should not be: Job creation loop. Since the job is being reused for each rabbit node and with the possibility of updating the job, make sure the pod selector is empty. Do this by making sure the job structure for creating new jobs is fresh by doing DeepCopy. See more here: https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-selector Job container volumes: If the NNFAccess mount is not ready, requeue rather than return an error. Job container start: It's possible that while waiting for the job containers to start, the jobs themselves don't exist or aren't queryable yet. Requeue. Signed-off-by: Blake Devcich <[email protected]> * Incorporate latest nnf-ec to fix format issue (#173) Signed-off-by: Anthony Floeder <[email protected]> * Use the live k8s client object in suite_test.go. (#175) In the kubebuilder book: https://book.kubebuilder.io/cronjob-tutorial/writing-tests.html It explains that we should be using the "live" k8s client rather than the one from the manager: "Note that we set up both a “live” k8s client and a separate client from the manager. This is because when making assertions in tests, you generally want to assert against the live state of the API server. If you use the client from the manager (k8sManager.GetClient), you’d end up asserting against the contents of the cache instead, which is slower and can introduce flakiness into your tests." * Upgrade controller-runtime and friends (#174) Upgrade controller-runtime, ginkgo, gomega. Revendor dws and pick up the new API for status updater. Upgrade controller-gen and env-k8s-version. Signed-off-by: Dean Roehrich <[email protected]> * Github #39: Separate NnfAccess mount/unmount code paths (#176) * Github #39: Separate NnfAccess mount/unmount code paths This commit separates out the logic for mounting and unmounting an NnfAccess. This was to provide proper unlocking of the NnfStorage for XFS and raw allocations. Signed-off-by: Matt Richerson <[email protected]> * review comments Signed-off-by: Matt Richerson <[email protected]> --------- Signed-off-by: Matt Richerson <[email protected]> * Added NnfContainerProfile validation webhook + unit tests (#172) - Added validation webhook for container profiles - Fix a bug in the container filesystem check for persistent filesystems - Add unit tests for container directives, most notably the storages in the profile and in the container directive arguements - Add integration test to ensure that targeted compute nodes select the correct local NNF nodes for container workflows Signed-off-by: Blake Devcich <[email protected]> * main.go has too many calls to controllers.NnfPortManagerReconciler (#178) Keep the one for the SLC, remove the one in main(). Signed-off-by: Dean Roehrich <[email protected]> * RABSW-1096: Add Lustre target allocation hints (#179) * RABSW-1096: Add Lustre target allocation hints This commit adds three new fields to the NnfStorageProfile that are used to direct the WLM on how many Lustre targets to create. The three new fields are: - Count: Specify how many Lustre targets to create - Scale: A unitless 1-10 value that the WLM uses with other information to come up with a target count - ColocateComputes: Limit the Lustre targets to the Rabbits in the same chassis as the compute nodes. These NnfStorageProfile fields are used to fill in the DirectiveBreakdown correctly. Signed-off-by: Matt Richerson <[email protected]> * Review comments Signed-off-by: Matt Richerson <[email protected]> * re-vendor Signed-off-by: Matt Richerson <[email protected]> --------- Signed-off-by: Matt Richerson <[email protected]> * Add MPI support to containers via mpi-operator (#177) This adds in a new way to create containers using mpi-operator. mpi-operator is now a requirement of nnf-sos in order to run MPI containers. Users can now launch MPI container workflows. This is done via the NnfContainerProfile. Container workflows can be executed in two ways: - MPI (launcher/worker model) - Non-MPI (one command for all containers) The launcher/worker model allows users to run `mpirun` on the launcher pod and then use the workers as nodes `mpirun`. See the mpi-operator docs for more: https://www.kubeflow.org/docs/components/training/mpi/. Major Changes: - Added `MPISpec` to define `MPIJobs` to container profile - Moved original container implementation from `Template` to `Spec` to mimic `MPISpec` name. User now only defines the PodSpec rather than the PodTemplateSpec - Added example-mpi NnfContainerProfile (used for testing) - Added permissions to both MPI and non-MPI containers to run as non root users (i.e. `user` or `mpiuser`). - Reworked PreRun to create either type and watch for successful container start for Ready logic. - Reworked PostRun to watch for completion for either type and determine Ready state if containers completed successfuly. - Added InitContainers to map the `user` or `mpiuser` to the workflow's User and Group ID. This allows ssh to work properly for mpirun. - New functions added to support both MPI and non-MPI container creation logic. - Use server-side deployment to workaround MPIJob's large CRD annotations Signed-off-by: Blake Devcich <[email protected]> * Add support for extra dcp and dryrun options in NnfDataMovementSpec (#180) In order to support per-DM configuration options, we need to add some options to the spec. These values will override/supplement the existing data movement configuration that is defined in the nnf-dm-config ConfigMap. In this case, LLNL has a need to add extra dcp options for a given data movement request. This will be done via the Copy Offload API. For debugging purposes, the dryrun option has also been added to fake out data movement. Signed-off-by: Blake Devcich <[email protected]> * Wait for DWS webhook (#181) Wait for the DWS webhook to be ready when doing a fresh deploy. Update an out of date CRD. Signed-off-by: Dean Roehrich <[email protected]> * RABSW-1159: Update deploy.sh to look at the deployment ready count (#182) The deploy.sh was looking for a "1/1" ready field for the dws webhook. There may not be enough worker nodes on the system to run all 3 DWS webhooks, so some of the webhook pods may not be ready. If one of these pods shows up first in the pod list, then the deploy.sh script will hang forever. Instead, look at the number of ready replicas in the dws webhook deployment to be one or more. Signed-off-by: Matt Richerson <[email protected]> * Use the new "lus" API group for lustre-fs-operator (#183) Use the new "lus" API group for lustre-fs-operator Signed-off-by: Dean Roehrich <[email protected]> * RABSW-1158: Update nnf-ec and add timeout environment variable (#184) * RABSW-1158: Update nnf-ec and add timeout environment variable Make use of the new timeout in nnf-ec when running commands. Timeout commands after 90 seconds and return an error. Signed-off-by: Matt Richerson <[email protected]> * use nnf-ec timeout env variable in seconds Signed-off-by: Matt Richerson <[email protected]> * re-vendor Signed-off-by: Matt Richerson <[email protected]> * go.mod/go.sum merge error Signed-off-by: Matt Richerson <[email protected]> --------- Signed-off-by: Matt Richerson <[email protected]> * DM Types: Add options to log/store stdout (#185) Signed-off-by: Blake Devcich <[email protected]> * Github action triggers on master and release branches Signed-off-by: Matt Richerson <[email protected]> --------- Signed-off-by: Nate Thornton <[email protected]> Signed-off-by: Dean Roehrich <[email protected]> Signed-off-by: Matt Richerson <[email protected]> Signed-off-by: Matt Richerson <[email protected]> Signed-off-by: Blake Devcich <[email protected]> Signed-off-by: Anthony Floeder <[email protected]> Co-authored-by: Nate Thornton <[email protected]> Co-authored-by: Dean Roehrich <[email protected]> Co-authored-by: Matt Richerson <[email protected]> Co-authored-by: Blake Devcich <[email protected]> Co-authored-by: Blake Devcich <[email protected]> Co-authored-by: Tony Floeder <[email protected]>
NearNodeFlash · May 1, 2023 · f2d5fcc · f2d5fcc
1 parent 3a40435
commit f2d5fcc
Show file tree

Hide file tree

Showing 1,276 changed files with 131,920 additions and 42,275 deletions.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -11,6 +11,7 @@ on:
     branches:
       - 'master'
       - 'releases/v*'
+      - 'feature/*'
 
 env:
   # TEST_TARGET: Name of the testing target in the Dockerfile
@@ -27,6 +28,11 @@ jobs:
     runs-on: ubuntu-latest
 
     steps:
+    - name: "Build context"
+      run: |
+        echo "ref is ${{ github.ref }}"
+        echo "ref_type is ${{ github.ref_type }}"
+
     - name: "Checkout repository"
       id: checkout_repo
       uses: actions/checkout@v3

diff --git a/.github/workflows/verify_tag.yml b/.github/workflows/verify_tag.yml
@@ -7,13 +7,15 @@ on:
     tags:
       - "v*"
 
-env:
-  IMAGE_NAME: ${{ github.repository }}
-
 jobs:
-  build:
+  verify_tag:
     runs-on: ubuntu-latest
     steps:
+      - name: "Verify context"
+        run: |
+          echo "ref is ${{ github.ref }}"
+          echo "ref_type is ${{ github.ref_type }}"
+
       - uses: actions/checkout@v3
         # actions/checkout@v3 breaks annotated tags by converting them into
         # lightweight tags, so we need to force fetch the tag again

diff --git a/.vscode/launch.json b/.vscode/launch.json
@@ -28,7 +28,9 @@
                 "-ginkgo.progress"
             ],
             "env": {
-                "KUBEBUILDER_ASSETS": "${workspaceFolder}/testbin/bin"
+                "KUBEBUILDER_ASSETS": "${workspaceFolder}/bin/k8s/1.25.0-darwin-amd64",
+                "GOMEGA_DEFAULT_EVENTUALLY_TIMEOUT": "10m",
+                "GOMEGA_DEFAULT_EVENTUALLY_POLLING_INTERVAL": "100ms"
             },
             "showLog": true
         },

diff --git a/Makefile b/Makefile
@@ -60,7 +60,7 @@ IMAGE_TAG_BASE ?= ghcr.io/nearnodeflash/nnf-sos
 # You can use it as an arg. (E.g make bundle-build BUNDLE_IMG=<some-registry>/<project-name-bundle>:<tag>)
 
 # ENVTEST_K8S_VERSION refers to the version of kubebuilder assets to be downloaded by envtest binary.
-ENVTEST_K8S_VERSION = 1.25.0
+ENVTEST_K8S_VERSION = 1.26.0
 
 # Jenkins behaviors
 # pipeline_service builds its target docker image and stores it into 1 of 3 destination folders.
@@ -223,7 +223,7 @@ test: manifests generate fmt vet envtest ## Run tests.
 	export GOMEGA_DEFAULT_EVENTUALLY_INTERVAL=${EVENTUALLY_INTERVAL}; \
 	export WEBHOOK_DIR=${ENVTEST_ASSETS_DIR}/webhook; \
 	for subdir in ${TESTDIRS}; do \
-		KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) -p path --bin-dir $(LOCALBIN))" go test -v ./$$subdir/... -coverprofile cover.out -ginkgo.v -ginkgo.progress $$failfast; \
+		KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) -p path --bin-dir $(LOCALBIN))" go test -v ./$$subdir/... -coverprofile cover.out -ginkgo.v $$failfast; \
 	done
 
 ##@ Build
@@ -254,7 +254,7 @@ install: manifests kustomize ## Install CRDs into the K8s cluster specified in ~
 	$(KUSTOMIZE) build config/crd | kubectl apply -f -
 
 uninstall: manifests kustomize ## Uninstall CRDs from the K8s cluster specified in ~/.kube/config.
-	$(KUSTOMIZE) build config/crd | kubectl delete -f -
+	$(KUSTOMIZE) build config/crd | kubectl delete --ignore-not-found -f -
 
 deploy: VERSION ?= $(shell cat .version)
 deploy: .version kustomize ## Deploy controller to the K8s cluster specified in ~/.kube/config.
@@ -285,7 +285,7 @@ ENVTEST ?= $(LOCALBIN)/setup-envtest
 
 ## Tool Versions
 KUSTOMIZE_VERSION ?= v4.5.7
-CONTROLLER_TOOLS_VERSION ?= v0.9.2
+CONTROLLER_TOOLS_VERSION ?= v0.11.1
 
 KUSTOMIZE_INSTALL_SCRIPT ?= "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"
 .PHONY: kustomize

diff --git a/PROJECT b/PROJECT
@@ -105,4 +105,24 @@ resources:
   kind: NnfNodeECData
   path: github.com/NearNodeFlash/nnf-sos/api/v1alpha1
   version: v1alpha1
+- api:
+    crdVersion: v1
+    namespaced: true
+  domain: cray.hpe.com
+  group: nnf
+  kind: NnfContainerProfile
+  path: github.com/NearNodeFlash/nnf-sos/api/v1alpha1
+  version: v1alpha1
+  webhooks:
+    validation: true
+    webhookVersion: v1
+- api:
+    crdVersion: v1
+    namespaced: true
+  controller: true
+  domain: cray.hpe.com
+  group: nnf
+  kind: NnfPortManager
+  path: github.com/NearNodeFlash/nnf-sos/api/v1alpha1
+  version: v1alpha1
 version: "3"
diff --git a/api/v1alpha1/nnf_access_types.go b/api/v1alpha1/nnf_access_types.go
@@ -35,7 +35,7 @@ type NnfAccessSpec struct {
 
 	// TeardownState is the desired state of the workflow for this NNF Access resource to
 	// be torn down and deleted.
-	// +kubebuilder:validation:Enum:=DataIn;PreRun;PostRun;DataOut
+	// +kubebuilder:validation:Enum:=PreRun;PostRun;Teardown
 	// +kubebuilder:validation:Type:=string
 	TeardownState dwsv1alpha1.WorkflowState `json:"teardownState"`
 
@@ -45,6 +45,12 @@ type NnfAccessSpec struct {
 	// +kubebuilder:validation:Enum=single;all
 	Target string `json:"target"`
 
+	// UserID for the new mount. Currently only used for raw
+	UserID uint32 `json:"userID"`
+
+	// GroupID for the new mount. Currently only used for raw
+	GroupID uint32 `json:"groupID"`
+
 	// ClientReference is for a client resource. (DWS) Computes is the only client
 	// resource type currently supported
 	ClientReference corev1.ObjectReference `json:"clientReference,omitempty"`

diff --git a/api/v1alpha1/nnf_datamovement_types.go b/api/v1alpha1/nnf_datamovement_types.go
@@ -59,6 +59,11 @@ type NnfDataMovementSpec struct {
 	// Set to true if the data movement operation should be canceled.
 	// +kubebuilder:default:=false
 	Cancel bool `json:"cancel,omitempty"`
+
+	// User defined configuration on how data movement should be performed. This overrides the
+	// configuration defined in the nnf-dm-config ConfigMap. These values are typically set by the
+	// Copy Offload API.
+	UserConfig *NnfDataMovementConfig `json:"userConfig,omitempty"`
 }
 
 // DataMovementSpecSourceDestination defines the desired source or destination of data movement
@@ -72,6 +77,31 @@ type NnfDataMovementSpecSourceDestination struct {
 	StorageReference corev1.ObjectReference `json:"storageReference,omitempty"`
 }
 
+// NnfDataMovementConfig provides a way for a user to override the data movement behavior on a
+// per DM basis.
+type NnfDataMovementConfig struct {
+
+	// Fake the Data Movement operation. The system "performs" Data Movement but the command to do so
+	// is trivial. This means a Data Movement request is still submitted but the IO is skipped.
+	// +kubebuilder:default:=false
+	Dryrun bool `json:"dryrun,omitempty"`
+
+	// Extra options to pass to the dcp command (used to perform data movement).
+	DCPOptions string `json:"dcpOptions,omitempty"`
+
+	// If true, enable the command's stdout to be saved in the log when the command completes
+	// successfully. On failure, the output is always logged.
+	// Note: Enabling this option may degrade performance.
+	// +kubebuilder:default:=false
+	LogStdout bool `json:"logStdout,omitempty"`
+
+	// Similar to LogStdout, store the command's stdout in Status.Message when the command completes
+	// successfully. On failure, the output is always stored.
+	// Note: Enabling this option may degrade performance.
+	// +kubebuilder:default:=false
+	StoreStdout bool `json:"storeStdout,omitempty"`
+}
+
 // DataMovementCommandStatus defines the observed status of the underlying data movement
 // command (MPI File Utils' `dcp` command).
 type NnfDataMovementCommandStatus struct {
@@ -89,7 +119,7 @@ type NnfDataMovementCommandStatus struct {
 
 	// LastMessage reflects the last message received over standard output or standard error as
 	// captured by the underlying data movement command.
-	LastMessage string `json:"message,omitempty"`
+	LastMessage string `json:"lastMessage,omitempty"`
 
 	// LastMessageTime reflects the time at which the last message was received over standard output or
 	// standard error by the underlying data movement command.
@@ -106,7 +136,8 @@ type NnfDataMovementStatus struct {
 	// +kubebuilder:validation:Enum=Success;Failed;Invalid;Cancelled
 	Status string `json:"status,omitempty"`
 
-	// Message contains any text that explains the Status.
+	// Message contains any text that explains the Status. If Data Movement failed or storeStdout is
+	// enabled, this will contain the command's output.
 	Message string `json:"message,omitempty"`
 
 	// StartTime reflects the time at which the Data Movement operation started.

diff --git a/api/v1alpha1/nnf_node_types.go b/api/v1alpha1/nnf_node_types.go
@@ -52,6 +52,9 @@ type NnfNodeStatus struct {
 
 	Health NnfResourceHealthType `json:"health,omitempty"`
 
+	// Fenced is true when the NNF Node is fenced by the STONITH agent, and false otherwise.
+	Fenced bool `json:"fenced,omitempty"`
+
 	Capacity          int64 `json:"capacity,omitempty"`
 	CapacityAllocated int64 `json:"capacityAllocated,omitempty"`
 

diff --git a/api/v1alpha1/nnf_port_manager_types.go b/api/v1alpha1/nnf_port_manager_types.go
@@ -0,0 +1,136 @@
+/*
+ * Copyright 2023 Hewlett Packard Enterprise Development LP
+ * Other additional copyright holders may be indicated within.
+ *
+ * The entirety of this work is licensed under the Apache License,
+ * Version 2.0 (the "License"); you may not use this file except
+ * in compliance with the License.
+ *
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package v1alpha1
+
+import (
+	"github.com/HewlettPackard/dws/utils/updater"
+	corev1 "k8s.io/api/core/v1"
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+)
+
+// EDIT THIS FILE!  THIS IS SCAFFOLDING FOR YOU TO OWN!
+// NOTE: json tags are required.  Any new fields you add must have json tags for the fields to be serialized.
+
+// NnfPortManagerAllocationSpec defines the desired state for a single port allocation
+type NnfPortManagerAllocationSpec struct {
+	// Requester is an object reference to the requester of a ports.
+	Requester corev1.ObjectReference `json:"requester"`
+
+	// Count is the number of desired ports the requester needs. The port manager
+	// will attempt to allocate this many ports.
+	// +kubebuilder:default:=1
+	Count int `json:"count"`
+}
+
+// NnfPortManagerSpec defines the desired state of NnfPortManager
+type NnfPortManagerSpec struct {
+	// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
+	// Important: Run "make" to regenerate code after modifying this file
+
+	// SystemConfiguration is an object reference to the system configuration. The
+	// Port Manager will use the available ports defined in the system configuration.
+	SystemConfiguration corev1.ObjectReference `json:"systemConfiguration"`
+
+	// Allocations is a list of allocation requests that the Port Manager will attempt
+	// to satisfy. To request port resources from the port manager, clients should add
+	// an entry to the allocations. Entries must be unique. The port manager controller
+	// will attempt to allocate port resources for each allocation specification in the
+	// list. To remove an allocation and free up port resources, remove the allocation
+	// from the list.
+	Allocations []NnfPortManagerAllocationSpec `json:"allocations"`
+}
+
+// AllocationStatus is the current status of a port requestor. A port that is in use by the respective owner
+// will have a status of "InUse". A port that is freed by the owner but not yet reclaimed by the port manager
+// will have a status of "Free". Any other status value indicates a failure of the port allocation.
+// +kubebuilder:validation:Enum:=InUse;Free;InvalidConfiguration;InsufficientResources
+type NnfPortManagerAllocationStatusStatus string
+
+const (
+	NnfPortManagerAllocationStatusInUse                 NnfPortManagerAllocationStatusStatus = "InUse"
+	NnfPortManagerAllocationStatusFree                  NnfPortManagerAllocationStatusStatus = "Free"
+	NnfPortManagerAllocationStatusInvalidConfiguration  NnfPortManagerAllocationStatusStatus = "InvalidConfiguration"
+	NnfPortManagerAllocationStatusInsufficientResources NnfPortManagerAllocationStatusStatus = "InsufficientResources"
+	// NOTE: You must ensure any new value is added to the above kubebuilder validation enum
+)
+
+// NnfPortManagerAllocationStatus defines the allocation status of a port for a given requester.
+type NnfPortManagerAllocationStatus struct {
+	// Requester is an object reference to the requester of the port resource, if one exists, or
+	// empty otherwise.
+	Requester *corev1.ObjectReference `json:"requester,omitempty"`
+
+	// Ports is list of ports allocated to the owning resource.
+	Ports []uint16 `json:"ports,omitempty"`
+
+	// Status is the ownership status of the port.
+	Status NnfPortManagerAllocationStatusStatus `json:"status"`
+}
+
+// PortManagerStatus is the current status of the port manager.
+// +kubebuilder:validation:Enum:=Ready;SystemConfigurationNotFound
+type NnfPortManagerStatusStatus string
+
+const (
+	NnfPortManagerStatusReady                       NnfPortManagerStatusStatus = "Ready"
+	NnfPortManagerStatusSystemConfigurationNotFound NnfPortManagerStatusStatus = "SystemConfigurationNotFound"
+	// NOTE: You must ensure any new value is added in the above kubebuilder validation enum
+)
+
+// NnfPortManagerStatus defines the observed state of NnfPortManager
+type NnfPortManagerStatus struct {
+	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
+	// Important: Run "make" to regenerate code after modifying this file
+
+	// Allocations is a list of port allocation status'.
+	Allocations []NnfPortManagerAllocationStatus `json:"allocations,omitempty"`
+
+	// Status is the current status of the port manager.
+	Status NnfPortManagerStatusStatus `json:"status"`
+}
+
+//+kubebuilder:object:root=true
+//+kubebuilder:subresource:status
+
+// NnfPortManager is the Schema for the nnfportmanagers API
+type NnfPortManager struct {
+	metav1.TypeMeta   `json:",inline"`
+	metav1.ObjectMeta `json:"metadata,omitempty"`
+
+	Spec   NnfPortManagerSpec   `json:"spec,omitempty"`
+	Status NnfPortManagerStatus `json:"status,omitempty"`
+}
+
+func (mgr *NnfPortManager) GetStatus() updater.Status[*NnfPortManagerStatus] {
+	return &mgr.Status
+}
+
+//+kubebuilder:object:root=true
+
+// NnfPortManagerList contains a list of NnfPortManager
+type NnfPortManagerList struct {
+	metav1.TypeMeta `json:",inline"`
+	metav1.ListMeta `json:"metadata,omitempty"`
+	Items           []NnfPortManager `json:"items"`
+}
+
+func init() {
+	SchemeBuilder.Register(&NnfPortManager{}, &NnfPortManagerList{})
+}
diff --git a/api/v1alpha1/nnf_resource_status_type.go b/api/v1alpha1/nnf_resource_status_type.go
@@ -20,6 +20,8 @@
 package v1alpha1
 
 import (
+	dwsv1alpha1 "github.com/HewlettPackard/dws/api/v1alpha1"
+
 	sf "github.com/NearNodeFlash/nnf-ec/pkg/rfsf/pkg/models"
 )
 
@@ -93,6 +95,25 @@ func (rst NnfResourceStatusType) UpdateIfWorseThan(status *NnfResourceStatusType
 	}
 }
 
+func (rst NnfResourceStatusType) ConvertToDWSResourceStatus() dwsv1alpha1.ResourceStatus {
+	switch rst {
+	case ResourceStarting:
+		return dwsv1alpha1.StartingStatus
+	case ResourceReady:
+		return dwsv1alpha1.ReadyStatus
+	case ResourceDisabled:
+		return dwsv1alpha1.DisabledStatus
+	case ResourceNotPresent:
+		return dwsv1alpha1.NotPresentStatus
+	case ResourceOffline:
+		return dwsv1alpha1.OfflineStatus
+	case ResourceFailed:
+		return dwsv1alpha1.FailedStatus
+	default:
+		return dwsv1alpha1.UnknownStatus
+	}
+}
+
 // StaticResourceStatus will convert a Swordfish ResourceStatus to the NNF Resource Status.
 func StaticResourceStatus(s sf.ResourceStatus) NnfResourceStatusType {
 	switch s.State {