Skip to content

Commit

Permalink
training-operator test: Update image (#70)
Browse files Browse the repository at this point in the history
Update the image to use one from katib repository, which is 2.75GB instead of 10GB
that the previous one was.

Closes #69
  • Loading branch information
orfeas-k authored Jun 27, 2024
1 parent ad0922d commit fe86b4e
Showing 1 changed file with 8 additions and 4 deletions.
12 changes: 8 additions & 4 deletions tests/notebooks/training/training-integration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -415,10 +415,13 @@
"metadata": {},
"outputs": [],
"source": [
"PYTORCHJOB_NAME = \"pytorch-dist-mnist-gloo\"\n",
"PYTORCHJOB_NAME = \"pytorch-mnist-gloo\"\n",
"PYTORCHJOB_CONTAINER = \"pytorch\"\n",
"PYTORCHJOB_IMAGE = \"kubeflow/pytorch-dist-mnist:v1-3a360ba\"\n",
"# The image above should be updated with each release with the latest available in the registry."
"PYTORCHJOB_IMAGE = \"kubeflowkatib/pytorch-mnist-cpu:v0.16.0\"\n",
"# The image above should be updated with each release with the corresponding Katib version used in CKF release.\n",
"# Note that instead of using the [image from training-operator repository](https://github.com/kubeflow/training-operator/blob/master/examples/pytorch/mnist/Dockerfile),\n",
"# the one [from Katib](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/trial-images/pytorch-mnist/Dockerfile.cpu) is being used\n",
"# due to the large size of the first one."
]
},
{
Expand All @@ -430,7 +433,8 @@
"container = V1Container(\n",
" name=PYTORCHJOB_CONTAINER,\n",
" image=PYTORCHJOB_IMAGE,\n",
" args=[\"--backend\", \"gloo\"],\n",
" args=[\"--backend\", \"gloo\", \"--epochs\", \"2\"],\n",
" # Passing `epochs`argument since kubeflowkatib image defaults to 10.\n",
")\n",
"\n",
"replica_spec = V1ReplicaSpec(\n",
Expand Down

0 comments on commit fe86b4e

Please sign in to comment.