feat: Minimal image (#581)

* Bump mlflow[extras] from 2.0.1 to 2.2.2 Bumps [mlflow[extras]](https://github.com/mlflow/mlflow) from 2.0.1 to 2.2.2. - [Release notes](https://github.com/mlflow/mlflow/releases) - [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md) - [Commits](mlflow/mlflow@v2.0.1...v2.2.2) --- updated-dependencies: - dependency-name: mlflow[extras] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * feat: Add a minimal Dockerfile * ci: Add support for -minimal tags in CI * ci: Change cluster * fix: Fix makefile * ci: Fix dockerfile variable passing in ci.yaml * ci: Add hosted tool cache cleanup See https://github.com/orgs/community/discussions/25678 * ci: Print the size of the hosted tool cache * ci: Remove CodeQL and go tool cache to save space * ci: Add proper tests for minimal image * ci: Configure test preset * feat: Use CUDA 11.7.1 by default * ci: Change preset to 3090 * deps: Bump dependencies Bump wandb[aws] from 0.13.6 to 0.13.11 Bumps [wandb[aws]](https://github.com/wandb/wandb) from 0.13.6 to 0.13.11. - [Release notes](https://github.com/wandb/wandb/releases) - [Changelog](https://github.com/wandb/wandb/blob/main/CHANGELOG.md) - [Commits](wandb/wandb@v0.13.6...v0.13.11) --- updated-dependencies: - dependency-name: wandb[aws] dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Bump matplotlib from 3.6.2 to 3.7.1 Bumps [matplotlib](https://github.com/matplotlib/matplotlib) from 3.6.2 to 3.7.1. - [Release notes](https://github.com/matplotlib/matplotlib/releases) - [Commits](matplotlib/matplotlib@v3.6.2...v3.7.1) --- updated-dependencies: - dependency-name: matplotlib dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Bump opencv-python-headless from 4.6.0.66 to 4.7.0.72 Bumps [opencv-python-headless](https://github.com/opencv/opencv-python) from 4.6.0.66 to 4.7.0.72. - [Release notes](https://github.com/opencv/opencv-python/releases) - [Commits](https://github.com/opencv/opencv-python/commits) --- updated-dependencies: - dependency-name: opencv-python-headless dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Bump scipy from 1.9.3 to 1.10.1 Bumps [scipy](https://github.com/scipy/scipy) from 1.9.3 to 1.10.1. - [Release notes](https://github.com/scipy/scipy/releases) - [Commits](scipy/scipy@v1.9.3...v1.10.1) --- updated-dependencies: - dependency-name: scipy dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Bump jupyterlab from 3.5.1 to 3.6.1 Bumps [jupyterlab](https://github.com/jupyterlab/jupyterlab) from 3.5.1 to 3.6.1. - [Release notes](https://github.com/jupyterlab/jupyterlab/releases) - [Changelog](https://github.com/jupyterlab/jupyterlab/blob/@jupyterlab/[email protected]/CHANGELOG.md) - [Commits](https://github.com/jupyterlab/jupyterlab/compare/@jupyterlab/[email protected]...@jupyterlab/[email protected]) --- updated-dependencies: - dependency-name: jupyterlab dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Bump tensorflow-gpu from 2.10.1 to 2.12.0 Bumps [tensorflow-gpu](https://github.com/tensorflow/tensorflow) from 2.10.1 to 2.12.0. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/commits) --- updated-dependencies: - dependency-name: tensorflow-gpu dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Bump future from 0.18.2 to 0.18.3 Bumps [future](https://github.com/PythonCharmers/python-future) from 0.18.2 to 0.18.3. - [Release notes](https://github.com/PythonCharmers/python-future/releases) - [Changelog](https://github.com/PythonCharmers/python-future/blob/master/docs/changelog.rst) - [Commits](PythonCharmers/python-future@v0.18.2...v0.18.3) --- updated-dependencies: - dependency-name: future dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Bump pillow from 9.3.0 to 9.4.0 Bumps [pillow](https://github.com/python-pillow/Pillow) from 9.3.0 to 9.4.0. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](python-pillow/Pillow@9.3.0...9.4.0) --- updated-dependencies: - dependency-name: pillow dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Bump ipywidgets from 8.0.3 to 8.0.4 Bumps [ipywidgets](https://github.com/jupyter-widgets/ipywidgets) from 8.0.3 to 8.0.4. - [Release notes](https://github.com/jupyter-widgets/ipywidgets/releases) - [Commits](jupyter-widgets/ipywidgets@8.0.3...8.0.4) --- updated-dependencies: - dependency-name: ipywidgets dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> [pre-commit.ci] pre-commit autoupdate updates: - [github.com/asottile/yesqa: v1.4.0 → v1.5.0](asottile/yesqa@v1.4.0...v1.5.0) - [github.com/PyCQA/isort: 5.10.1 → 5.12.0](PyCQA/isort@5.10.1...5.12.0) - [github.com/psf/black: 22.12.0 → 23.7.0](psf/black@22.12.0...23.7.0) - [github.com/asottile/pyupgrade: v3.3.1 → v3.10.1](asottile/pyupgrade@v3.3.1...v3.10.1) - [github.com/PyCQA/flake8: 6.0.0 → 6.1.0](PyCQA/flake8@6.0.0...6.1.0) * deps: Bump aws-cli, neuro-all, torch, tensorflow * deps: Update tf to 2.13.0 * feat: Use CUDA 11.8 * feat: Use CUDA 11.8 in ci * fix: Fix missing cudnn8 in the base image matrix * fix: Add a script for libdevice.10.bc install --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
neuro-inc · Aug 25, 2023 · ace0821 · ace0821
1 parent f5cab55
commit ace0821
Show file tree

Hide file tree

Showing 11 changed files with 240 additions and 26 deletions.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -21,8 +21,12 @@ jobs:
         base-image:
           - nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
           - nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
+        dockerfile:
+          - Dockerfile
+          - Dockerfile.minimal
     env:
       IMAGE_NAME: ghcr.io/neuro-inc/base
+      NEURO_CLUSTER: onprem-poc
       NEURO_STAGING_URL: ${{ secrets.NEURO_STAGING_URL }}
       NEURO_TOKEN: ${{ secrets.NEURO_TOKEN }}
       BASE_IMAGE: ${{ matrix.base-image }}
@@ -50,7 +54,7 @@ jobs:
         run: |
           source venv/bin/activate
           neuro config login-with-token $NEURO_TOKEN $NEURO_STAGING_URL
-          neuro config switch-cluster revenuegrid-aws
+          neuro config switch-cluster $NEURO_CLUSTER
           neuro config show
 
       - name: Login ghcr.io
@@ -74,12 +78,22 @@ jobs:
           elif [[ ${{ matrix.base-image }} =~ devel ]]; then
             export BASE_IMAGE_TYPE="devel";
           fi
+          if [[ ${{ matrix.dockerfile }} =~ minimal ]]; then
+            export BASE_IMAGE_TYPE="$BASE_IMAGE_TYPE-minimal";
+          fi
           echo "::set-output name=BASE_IMAGE_TYPE::$BASE_IMAGE_TYPE"
           echo "::set-output name=platform_image_tag::ghcr.io/neuro-inc/base:built-$BASE_IMAGE_TYPE"
 
+      - name: Cleanup tool cache
+        run: |
+          du -h -d 1 /opt/hostedtoolcache
+          rm -rf /opt/hostedtoolcache/go /opt/hostedtoolcache/CodeQL
+
+
       - name: Build image
         env:
           BASE_IMAGE_TYPE: ${{ steps.get-image-tags.outputs.BASE_IMAGE_TYPE }}
+          DOCKERFILE: ${{ matrix.dockerfile }}
         run: |
           make image_build
 
@@ -90,9 +104,13 @@ jobs:
       - name: Test image
         env:
           BASE_IMAGE_TYPE: ${{ steps.get-image-tags.outputs.BASE_IMAGE_TYPE }}
+          TEST_PRESET: gpu-1x3090
         run: |
           source venv/bin/activate
           make e2e_neuro_push
+          if [[ ${{ matrix.dockerfile }} =~ minimal ]]; then
+            export TEST_CMD="bash /var/storage/dependencies.minimal.sh";
+          fi
           make test_dependencies
 
       - name: Push release

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -5,15 +5,15 @@ repos:
   - id: check-merge-conflict
     exclude: "rst$"
 - repo: https://github.com/asottile/yesqa
-  rev: v1.4.0
+  rev: v1.5.0
   hooks:
   - id: yesqa
 - repo: https://github.com/PyCQA/isort
-  rev: '5.10.1'
+  rev: '5.12.0'
   hooks:
   - id: isort
 - repo: https://github.com/psf/black
-  rev: '22.12.0'
+  rev: '23.7.0'
   hooks:
     - id: black
       language_version: python3 # Should be a command that runs python3.6+
@@ -38,11 +38,11 @@ repos:
     files: |
       .gitignore
 - repo: https://github.com/asottile/pyupgrade
-  rev: 'v3.3.1'
+  rev: 'v3.10.1'
   hooks:
   - id: pyupgrade
     args: ['--py36-plus']
 - repo: https://github.com/PyCQA/flake8
-  rev: '6.0.0'
+  rev: '6.1.0'
   hooks:
   - id: flake8
diff --git a/Dockerfile b/Dockerfile
@@ -56,17 +56,19 @@ RUN APT_INSTALL="apt-get install -y --no-install-recommends" && \
 # python
 # ------------------------------------------------------------------
 COPY requirements/python.txt /tmp/requirements/python.txt
+COPY libdevice_fix.sh /tmp/libdevice_fix.sh
 # ==================================================================
 # Miniconda
 # ------------------------------------------------------------------
-RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh -O ~/miniconda.sh && \
+RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-py311_23.5.2-0-Linux-x86_64.sh -O ~/miniconda.sh && \
     /bin/bash ~/miniconda.sh -b -p /opt/conda && \
     rm ~/miniconda.sh && \
     ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
     echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
     echo "conda activate base" >> ~/.bashrc && \
     . /opt/conda/etc/profile.d/conda.sh && \
     conda activate base && \
+    . /tmp/libdevice_fix.sh && \
 # ==================================================================
 # Python
 # ------------------------------------------------------------------

diff --git a/Dockerfile.minimal b/Dockerfile.minimal
@@ -0,0 +1,150 @@
+ARG BASE_IMAGE=nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
+FROM ${BASE_IMAGE}
+ENV LANG C.UTF-8
+RUN APT_INSTALL="apt-get install -y --no-install-recommends" && \
+    apt-get update -qq && \
+# ==================================================================
+# tools
+# ------------------------------------------------------------------
+    DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
+        cron \
+        curl \
+        git \
+        libssl-dev \
+        python3-dev \
+        python3-pip \
+        python3-venv \
+        rsync \
+        rclone \
+        unrar \
+        zip \
+        unzip \
+        vim \
+        wget \
+        libncurses5-dev \
+        libncursesw5-dev \
+        libglib2.0-0 \
+        gcc \
+        make \
+        cmake \
+        nano \
+        tmux \
+        htop \
+        ssh \
+        && \
+        # NVTop >>
+        git clone --depth 1 --branch 1.2.2 -q https://github.com/Syllo/nvtop.git nvtop && \
+        mkdir -p nvtop/build && cd nvtop/build && \
+        cmake --log-level=WARNING .. && \
+        make --quiet install && \
+        cd ../.. && rm -r nvtop && \
+        # <<
+        ln -s $(which python3) /usr/bin/python && \
+        # Git-LFS >>
+        curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash && \
+        DEBIAN_FRONTEND=noninteractive $APT_INSTALL git-lfs && \
+        # <<
+        # Remove PyYAML before other pip tools installation
+        # since APT installs outdated PyYAML as dist package, which breaks pip's deps management
+        # https://stackoverflow.com/questions/49911550/how-to-upgrade-disutils-package-pyyaml
+        rm -rf /usr/lib/python3/dist-packages/yaml && \
+        rm -rf /usr/lib/python3/dist-packages/PyYAML-* && \
+        apt-get clean && \
+        apt-get autoremove -y --purge && \
+        rm -rf /var/lib/apt/lists/* /tmp/* ~/*
+# ==================================================================
+# python
+# ------------------------------------------------------------------
+COPY requirements/python.minimal.txt /tmp/requirements/python.txt
+COPY libdevice_fix.sh /tmp/libdevice_fix.sh
+# ==================================================================
+# Miniconda
+# ------------------------------------------------------------------
+RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-py311_23.5.2-0-Linux-x86_64.sh -O ~/miniconda.sh && \
+    /bin/bash ~/miniconda.sh -b -p /opt/conda && \
+    rm ~/miniconda.sh && \
+    ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
+    echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
+    echo "conda activate base" >> ~/.bashrc && \
+    . /opt/conda/etc/profile.d/conda.sh && \
+    conda activate base && \
+    . /tmp/libdevice_fix.sh && \
+# ==================================================================
+# Python
+# ------------------------------------------------------------------
+    PIP_INSTALL="python -m pip --no-cache-dir install --upgrade" && \
+    $PIP_INSTALL pip pipx && \
+    python3 -m pipx ensurepath && \
+    $PIP_INSTALL -r /tmp/requirements/python.txt && \
+    rm -r /tmp/requirements
+# ==================================================================
+# OOM guard
+# Adds a script to tune oom_killer behavior and puts it into the crontab
+# ==================================================================
+COPY files/usr/local/sbin/oom_guard.sh /usr/local/sbin/oom_guard.sh
+RUN crontab -l 2>/dev/null | { cat; echo '* * * * * /usr/local/sbin/oom_guard.sh'; } | crontab
+
+# ==================================================================
+# Set up SSH for remote debug
+# ------------------------------------------------------------------
+
+# Setup environment for ssh session
+RUN apt-get install -y --no-install-recommends openssh-server && \
+    echo "export PATH=/root/.local/bin:$PATH" >> /etc/profile && \
+    echo "export LANG=$LANG" >> /etc/profile && \
+    echo "export LANGUAGE=$LANGUAGE" >> /etc/profile && \
+    echo "export LC_ALL=$LC_ALL" >> /etc/profile && \
+    echo "export PYTHONIOENCODING=$PYTHONIOENCODING" >> /etc/profile && \
+    . /etc/profile && \
+    apt-get clean && \
+    apt-get autoremove && \
+    rm -rf /var/lib/apt/lists/* /tmp/* ~/*
+
+# Create folder for openssh fifos
+RUN mkdir -p /var/run/sshd
+
+# Disable password for root
+RUN sed -i -re 's/^root:[^:]+:/root::/' /etc/shadow
+RUN sed -i -re 's/^root:.*$/root::0:0:System Administrator:\/root:\/bin\/bash/' /etc/passwd
+
+# Permit root login over ssh
+RUN echo "Subsystem    sftp    /usr/lib/sftp-server \n\
+PasswordAuthentication yes\n\
+ChallengeResponseAuthentication yes\n\
+PermitRootLogin yes \n\
+PermitEmptyPasswords yes\n" > /etc/ssh/sshd_config
+
+# ssh port
+EXPOSE 22
+
+# ==================================================================
+# Neu.ro and other isolated via pipx Python packages
+# ------------------------------------------------------------------
+COPY requirements/pipx.minimal.txt /tmp/requirements/pipx.txt
+# Used for pipx
+ENV PATH=/opt/conda/bin:/root/.local/bin/:$PATH
+RUN cat /tmp/requirements/pipx.txt | xargs -rn 1 pipx install && \
+    pipx list --json && \
+    # This is TMP work-around due to https://github.com/neuro-inc/neuro-cli/pull/2671
+    pipx runpip neuro-all uninstall -y click && \
+    pipx runpip neuro-all install click==8.1.3 && \
+    rm -r /tmp/requirements
+# ==================================================================
+# config
+# ------------------------------------------------------------------
+
+RUN ldconfig
+
+EXPOSE 8888 6006
+
+# Force the stdout and stderr streams to be unbuffered.
+# Needed for correct work of tqdm via 'neuro exec'
+ENV PYTHONUNBUFFERED 1
+
+WORKDIR /project
+
+## Setup entrypoint
+COPY entrypoint.sh /entrypoint.sh
+
+RUN chmod +x /entrypoint.sh
+ENTRYPOINT ["bash", "/entrypoint.sh"]
diff --git a/Makefile b/Makefile
@@ -8,6 +8,8 @@ TEST_STORAGE_SUFFIX := $(shell bash -c 'echo $$(date +"%Y-%m-%d--%H-%M-%S")-$$RA
 BASE_IMAGE ?= nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
 BASE_IMAGE_TYPE ?=
 
+DOCKERFILE ?= Dockerfile
+
 .PHONY: setup
 setup:
 	pip install pre-commit
@@ -18,7 +20,7 @@ image_build:
 	docker build \
 		-t $(TARGET_IMAGE_NAME):built-$(BASE_IMAGE_TYPE) \
 		--build-arg BASE_IMAGE=${BASE_IMAGE} \
-		-f Dockerfile .
+		-f $(DOCKERFILE) .
 
 .PHONY: image_deploy
 image_deploy:
@@ -32,8 +34,8 @@ image_deploy:
 e2e_neuro_push:
 	neuro push $(TARGET_IMAGE_NAME):built-$(BASE_IMAGE_TYPE) $(TEST_IMAGE_NAME):$(BASE_IMAGE_TYPE)
 
-TEST_PRESET=gpu-large
-TEST_CMD=bash /var/storage/dependencies.sh
+TEST_PRESET ?= gpu-large
+TEST_CMD ?= bash /var/storage/dependencies.sh
 .PHONY: test_dependencies
 test_dependencies:
 	neuro mkdir -p $(TEST_STORAGE)/$(TEST_STORAGE_SUFFIX)

diff --git a/files/testing/dependencies.minimal.sh b/files/testing/dependencies.minimal.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+set -xev -o pipefail
+
+rsync --version
+rclone --version
+
+curl --version
+wget --version
+
+zip --version
+unzip --help
+unrar -V
+
+vim --version
+nano --version
+
+tmux -V
+ssh -V
+git --version
+git-lfs --version
+nvtop --version
+
+service cron status
+
+neuro --version
+neuro-extras --version
+neuro-flow --version
+neuro config show
+
+nvidia-smi
diff --git a/libdevice_fix.sh b/libdevice_fix.sh
@@ -0,0 +1,11 @@
+#!/bin/sh
+if ! command -v nvcc >/dev/null 2>&1; then
+  echo "====== applying libdevice fix ======"
+  # (A.K.) Need to install nvcc since it provides libdevice.10.bc
+  # This adds less than 100MB to the image size
+  # Adapted from https://www.tensorflow.org/install/pip#ubuntu_2204
+  conda install -y -c nvidia cuda-nvcc=11.3.58 && \
+  mkdir -p /usr/local/cuda/nvvm/libdevice && \
+  ln -s /opt/conda/nvvm/libdevice/libdevice.10.bc /usr/local/cuda/nvvm/libdevice/ && \
+  echo "====== libdevice fix applied ======"
+fi
diff --git a/requirements/pipx.minimal.txt b/requirements/pipx.minimal.txt
@@ -0,0 +1 @@
+neuro-all==23.7.1
diff --git a/requirements/pipx.txt b/requirements/pipx.txt
@@ -1,2 +1,2 @@
-awscli==1.27.13
-neuro-all==22.8.1
+awscli==1.29.17
+neuro-all==23.7.1
diff --git a/requirements/python.minimal.txt b/requirements/python.minimal.txt
diff --git a/requirements/python.txt b/requirements/python.txt
@@ -1,19 +1,19 @@
 cloudpickle==2.2.0
-future==0.18.2
-ipywidgets==8.0.3
-jupyterlab==3.5.1
-matplotlib==3.6.2
-mlflow[extras]==2.0.1
-opencv-python-headless==4.6.0.66
+future==0.18.3
+ipywidgets==8.0.4
+jupyterlab==3.6.1
+matplotlib==3.7.1
+mlflow[extras]==2.2.2
+opencv-python-headless==4.7.0.72
 pandas==1.5.2
-Pillow==9.3.0
+Pillow==9.4.0
 scikit-learn==1.2.0
-scipy==1.9.3
-tensorboardX==2.5.1
-tensorflow-gpu==2.10.1
-torch==1.13.0+cu116
-torchaudio==0.13.0+cu116
-torchvision==0.14.0+cu116
+scipy==1.10.1
+tensorboardX==2.6.2
+tensorflow==2.13.0
+torch==2.0.1+cu117
+torchaudio==2.0.2+cu117
+torchvision==0.15.2+cu117
 tqdm==4.64.1
 typing==3.7.4.3
-wandb[aws]==0.13.6
+wandb[aws]==0.13.11