Skip to content
This repository has been archived by the owner on Aug 17, 2023. It is now read-only.

Kubeflow Fairing TrainJob creates an image with Root user and fairing job pod will not execute on AKS which has policy to not allow Docker containers running as Root user #525

Open
pshah16 opened this issue Jul 24, 2020 · 1 comment
Labels

Comments

@pshah16
Copy link

pshah16 commented Jul 24, 2020

/kind bug

What steps did you take and what happened:

I am running a simple fairing example shown here with Microsoft Azure backend.


from kubeflow import fairing
from kubeflow.fairing import TrainJob
from kubeflow.fairing.backends import KubeflowAzureBackend
from kubeflow.fairing.kubernetes.utils import get_resource_mutator

class Trainer(object):
def train(self):
print("hello world!")

from kubeflow.fairing.builders.cluster.azurestorage_context import StorageContextSource
BuildContext = StorageContextSource(
region=AZURE_REGION, resource_group_name=AZURE_RESOURCE_GROUP,
storage_account_name=AZURE_STORAGE_ACCOUNT
)
job = TrainJob(Trainer,
input_files=['ames_dataset/train.csv', "requirements.txt"],
docker_registry=DOCKER_REGISTRY, base_docker_image = None,
backend=KubeflowAzureBackend(build_context_source=BuildContext))
job.submit()


When job.submit() command executes, I get the following messages (no errors)...Then the command never finishes executing and nothing happens beyond this point.

[I 200722 19:15:28 azure:156] Creating secret 'storage-credentials-5a318d6e' in namespace 'pshah'
[W 200722 19:15:29 manager:298] Waiting for fairing-builder-5zzdt-mxn9b to start...
[W 200722 19:15:29 manager:298] Waiting for fairing-builder-5zzdt-mxn9b to start...
[W 200722 19:15:29 manager:298] Waiting for fairing-builder-5zzdt-mxn9b to start...
[W 200722 19:15:31 manager:298] Waiting for fairing-builder-5zzdt-mxn9b to start...

When I checked the status of the fairing job using kubectl, I noticed following:
state:
waiting:
message: container has runAsNonRoot and image will run as root
reason: CreateContainerConfigError

I checked with our cluster team they confirmed that our AKS cluster has a policy that will not allow Docker containers to run as Root user and hence the pod tries to schedule but never executes. When fairing creates an image, it has Root user by default in the image it built.

What did you expect to happen:
The error should have been clearly displayed when executing the Trainjob.submit() command. It should not remain stuck waiting forever. Also, Kubeflow fairing commands (including Trainjob.submit()) needs to have some way or setting through which we can set the user as some other non-root user in the Docker image that it creates and pushes to the registry and executes on AKS.

Anything else you would like to add:
How to run Fairing Train_job.submit() command successfully if my cluster has policy to not allow Docker images with root user?

Environment:

  • Fairing version: (use python -c "import kubeflow.fairing; print(kubeflow.fairing.__version__)"): 1.0.1
  • Kubeflow version: (version number can be found at the bottom left corner of the Kubeflow dashboard): build version v1beta1
  • Minikube version:
  • Kubernetes version: (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:30:10Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.7", GitCommit:"169db3bff4b5fb7722e967c5b6356713f05f15ed", GitTreeState:"clean", BuildDate:"2020-04-03T16:14:09Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
  • OS (e.g. from /etc/os-release):

NOTE: If you are using fair from master, please provide us the git commit hash.

@issue-label-bot
Copy link

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants