Skip to content

Commit

Permalink
Add container support to workflows (#159)
Browse files Browse the repository at this point in the history
Note: This is an experimental feature, but is working end-to-end (i.e. Proposal to Teardown) via the workflow process.

This feature adds experimental support for NNF Containers in workflows. Containers workflows are created by using the `#DW container` directive. An `NnfContainerProfile` must be supplied to the directive to instruct the workflow on what containers to create and which volumes to mount inside of the container. Look at the sample container profile in the `config/samples` directory for more information. The `config/examples` directory are deployed with examples profiles on the system, but do not contain the full documentation.

The computes resource must also be updated to instruct the workflow on where to place the container pods. The provided compute nodes will be traced back to their local rabbit node, which will be used as the targets for the pods.

Containers are created during `PreRun` through the use of Kubernetes Jobs. Each rabbit node will be the target of one kubernetes Job, which will manage the successful completion of the container. `PreRun` will progress to `ready:true` when the pods have started successfully. Each container has volumes mounted inside of it that are defined by the container profile. The mount paths for these volumes are exposed to the container via environment variables that match the storage names provided by the container directive's arguments (e.g. DW_JOB_foo-local-storage). These storages can be considered optional or not. If not, and the storage argument isn't supplied to the directive, the workflow will fail in the `Proposal` state.

Once the workflow has progressed to `PostRun`, the workflow will start to check if the pods have finished. Once finished, `PostRun` will progress to `ready:true` if all pods (i.e. k8s jobs) have completed successfully. If not, `PostRun` will remain in ready:false.

Example container directive:

```
#DW jobdw name=my-gfs2 type=gfs2 capacity=50GB
#DW persistentdw name=my-persistent
#DW container name=my-container profile=example-randomly-fail
       DW_JOB_foo-local-storage=my-gfs2
       DW_PERSISTENT_foo-persistent-storage=my-persistent
```

---------

Signed-off-by: Blake Devcich <[email protected]>
Signed-off-by: Nate Thornton <[email protected]>
Co-authored-by: Nate Thornton <[email protected]>
  • Loading branch information
bdevcich and Nate Thornton authored Feb 22, 2023
1 parent 4fb5d4f commit 59e3bc2
Show file tree
Hide file tree
Showing 23 changed files with 8,440 additions and 78 deletions.
8 changes: 8 additions & 0 deletions PROJECT
Original file line number Diff line number Diff line change
Expand Up @@ -105,4 +105,12 @@ resources:
kind: NnfNodeECData
path: github.com/NearNodeFlash/nnf-sos/api/v1alpha1
version: v1alpha1
- api:
crdVersion: v1
namespaced: true
domain: cray.hpe.com
group: nnf
kind: NnfContainerProfile
path: github.com/NearNodeFlash/nnf-sos/api/v1alpha1
version: v1alpha1
version: "3"
86 changes: 86 additions & 0 deletions api/v1alpha1/nnfcontainerprofile_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
/*
* Copyright 2023 Hewlett Packard Enterprise Development LP
* Other additional copyright holders may be indicated within.
*
* The entirety of this work is licensed under the Apache License,
* Version 2.0 (the "License"); you may not use this file except
* in compliance with the License.
*
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package v1alpha1

import (
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

const (
ContainerLabel = "nnf.cray.hpe.com/container"
)

// NnfContainerProfileSpec defines the desired state of NnfContainerProfile
type NnfContainerProfileData struct {
// Pinned is true if this instance is an immutable copy
// +kubebuilder:default:=false
Pinned bool `json:"pinned,omitempty"`

// List of possible filesystems supported by this container profile
Storages []NnfContainerProfileStorage `json:"storages,omitempty"`

// Stop any containers after X seconds once a workflow has transitioned to PostRun. Defaults to 0.
// A value of 0 disables this behavior.
PostRunTimeoutSeconds int64 `json:"postRunTimeoutSeconds,omitempty"`

// Specifies the number of times a container will be retried upon a failure. A new pod is deployed on each retry.
// Defaults to 6 by kubernetes itself and must be set. A value of 0 disables retries.
// +kubebuilder:default:=6
RetryLimit int32 `json:"retryLimit"`

// Template defines the containers that will be created from container profile
Template corev1.PodTemplateSpec `json:"template"`
}

// NnfContainerProfileStorage defines the mount point information that will be available to the
// container
type NnfContainerProfileStorage struct {
// Name specifies the name of the mounted filesystem; must match the user supplied #DW directive
Name string `json:"name"`

// Optional designates that this filesystem is available to be mounted, but can be ignored by
// the user not supplying this filesystem in the #DW directives
//+kubebuilder:default:=false
Optional bool `json:"optional"`
}

// +kubebuilder:object:root=true

// NnfContainerProfile is the Schema for the nnfcontainerprofiles API
type NnfContainerProfile struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Data NnfContainerProfileData `json:"data,omitempty"`
}

// +kubebuilder:object:root=true

// NnfContainerProfileList contains a list of NnfContainerProfile
type NnfContainerProfileList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []NnfContainerProfile `json:"items"`
}

func init() {
SchemeBuilder.Register(&NnfContainerProfile{}, &NnfContainerProfileList{})
}
8 changes: 8 additions & 0 deletions api/v1alpha1/workflow_helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,12 @@ const (
// PinnedStorageProfileLabelNameSpace is a label applied to NnfStorage objects to show
// which pinned storage profile is being used.
PinnedStorageProfileLabelNameSpace = "nnf.cray.hpe.com/pinned_storage_profile_namespace"

// PinnedContainerProfileLabelName is a label applied to NnfStorage objects to show
// which pinned container profile is being used.
PinnedContainerProfileLabelName = "nnf.cray.hpe.com/pinned_container_profile_name"

// PinnedContainerProfileLabelNameSpace is a label applied to NnfStorage objects to show
// which pinned container profile is being used.
PinnedContainerProfileLabelNameSpace = "nnf.cray.hpe.com/pinned_container_profile_namespace"
)
96 changes: 95 additions & 1 deletion api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 59e3bc2

Please sign in to comment.