Skip to content

Commit

Permalink
feat: Minimal image (#581)
Browse files Browse the repository at this point in the history
* Bump mlflow[extras] from 2.0.1 to 2.2.2

Bumps [mlflow[extras]](https://github.com/mlflow/mlflow) from 2.0.1 to 2.2.2.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](mlflow/mlflow@v2.0.1...v2.2.2)

---
updated-dependencies:
- dependency-name: mlflow[extras]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* feat: Add a minimal Dockerfile

* ci: Add support for -minimal tags in CI

* ci: Change cluster

* fix: Fix makefile

* ci: Fix dockerfile variable passing in ci.yaml

* ci: Add hosted tool cache cleanup
See https://github.com/orgs/community/discussions/25678

* ci: Print the size of the hosted tool cache

* ci: Remove CodeQL and go tool cache to save space

* ci: Add proper tests for minimal image

* ci: Configure test preset

* feat: Use CUDA 11.7.1 by default

* ci: Change preset to 3090

* deps: Bump dependencies

Bump wandb[aws] from 0.13.6 to 0.13.11

Bumps [wandb[aws]](https://github.com/wandb/wandb) from 0.13.6 to 0.13.11.
- [Release notes](https://github.com/wandb/wandb/releases)
- [Changelog](https://github.com/wandb/wandb/blob/main/CHANGELOG.md)
- [Commits](wandb/wandb@v0.13.6...v0.13.11)

---
updated-dependencies:
- dependency-name: wandb[aws]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Bump matplotlib from 3.6.2 to 3.7.1

Bumps [matplotlib](https://github.com/matplotlib/matplotlib) from 3.6.2 to 3.7.1.
- [Release notes](https://github.com/matplotlib/matplotlib/releases)
- [Commits](matplotlib/matplotlib@v3.6.2...v3.7.1)

---
updated-dependencies:
- dependency-name: matplotlib
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Bump opencv-python-headless from 4.6.0.66 to 4.7.0.72

Bumps [opencv-python-headless](https://github.com/opencv/opencv-python) from 4.6.0.66 to 4.7.0.72.
- [Release notes](https://github.com/opencv/opencv-python/releases)
- [Commits](https://github.com/opencv/opencv-python/commits)

---
updated-dependencies:
- dependency-name: opencv-python-headless
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Bump scipy from 1.9.3 to 1.10.1

Bumps [scipy](https://github.com/scipy/scipy) from 1.9.3 to 1.10.1.
- [Release notes](https://github.com/scipy/scipy/releases)
- [Commits](scipy/scipy@v1.9.3...v1.10.1)

---
updated-dependencies:
- dependency-name: scipy
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Bump jupyterlab from 3.5.1 to 3.6.1

Bumps [jupyterlab](https://github.com/jupyterlab/jupyterlab) from 3.5.1 to 3.6.1.
- [Release notes](https://github.com/jupyterlab/jupyterlab/releases)
- [Changelog](https://github.com/jupyterlab/jupyterlab/blob/@jupyterlab/[email protected]/CHANGELOG.md)
- [Commits](https://github.com/jupyterlab/jupyterlab/compare/@jupyterlab/[email protected]...@jupyterlab/[email protected])

---
updated-dependencies:
- dependency-name: jupyterlab
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Bump tensorflow-gpu from 2.10.1 to 2.12.0

Bumps [tensorflow-gpu](https://github.com/tensorflow/tensorflow) from 2.10.1 to 2.12.0.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/commits)

---
updated-dependencies:
- dependency-name: tensorflow-gpu
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Bump future from 0.18.2 to 0.18.3

Bumps [future](https://github.com/PythonCharmers/python-future) from 0.18.2 to 0.18.3.
- [Release notes](https://github.com/PythonCharmers/python-future/releases)
- [Changelog](https://github.com/PythonCharmers/python-future/blob/master/docs/changelog.rst)
- [Commits](PythonCharmers/python-future@v0.18.2...v0.18.3)

---
updated-dependencies:
- dependency-name: future
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Bump pillow from 9.3.0 to 9.4.0

Bumps [pillow](https://github.com/python-pillow/Pillow) from 9.3.0 to 9.4.0.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](python-pillow/Pillow@9.3.0...9.4.0)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Bump ipywidgets from 8.0.3 to 8.0.4

Bumps [ipywidgets](https://github.com/jupyter-widgets/ipywidgets) from 8.0.3 to 8.0.4.
- [Release notes](https://github.com/jupyter-widgets/ipywidgets/releases)
- [Commits](jupyter-widgets/ipywidgets@8.0.3...8.0.4)

---
updated-dependencies:
- dependency-name: ipywidgets
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

[pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/asottile/yesqa: v1.4.0 → v1.5.0](asottile/yesqa@v1.4.0...v1.5.0)
- [github.com/PyCQA/isort: 5.10.1 → 5.12.0](PyCQA/isort@5.10.1...5.12.0)
- [github.com/psf/black: 22.12.0 → 23.7.0](psf/black@22.12.0...23.7.0)
- [github.com/asottile/pyupgrade: v3.3.1 → v3.10.1](asottile/pyupgrade@v3.3.1...v3.10.1)
- [github.com/PyCQA/flake8: 6.0.0 → 6.1.0](PyCQA/flake8@6.0.0...6.1.0)

* deps: Bump aws-cli, neuro-all, torch, tensorflow

* deps: Update tf to 2.13.0

* feat: Use CUDA 11.8

* feat: Use CUDA 11.8 in ci

* fix: Fix missing cudnn8 in the base image matrix

* fix: Add a script for libdevice.10.bc install

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  • Loading branch information
andriihomiak and dependabot[bot] authored Aug 25, 2023
1 parent f5cab55 commit ace0821
Show file tree
Hide file tree
Showing 11 changed files with 240 additions and 26 deletions.
20 changes: 19 additions & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,12 @@ jobs:
base-image:
- nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
- nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
dockerfile:
- Dockerfile
- Dockerfile.minimal
env:
IMAGE_NAME: ghcr.io/neuro-inc/base
NEURO_CLUSTER: onprem-poc
NEURO_STAGING_URL: ${{ secrets.NEURO_STAGING_URL }}
NEURO_TOKEN: ${{ secrets.NEURO_TOKEN }}
BASE_IMAGE: ${{ matrix.base-image }}
Expand Down Expand Up @@ -50,7 +54,7 @@ jobs:
run: |
source venv/bin/activate
neuro config login-with-token $NEURO_TOKEN $NEURO_STAGING_URL
neuro config switch-cluster revenuegrid-aws
neuro config switch-cluster $NEURO_CLUSTER
neuro config show
- name: Login ghcr.io
Expand All @@ -74,12 +78,22 @@ jobs:
elif [[ ${{ matrix.base-image }} =~ devel ]]; then
export BASE_IMAGE_TYPE="devel";
fi
if [[ ${{ matrix.dockerfile }} =~ minimal ]]; then
export BASE_IMAGE_TYPE="$BASE_IMAGE_TYPE-minimal";
fi
echo "::set-output name=BASE_IMAGE_TYPE::$BASE_IMAGE_TYPE"
echo "::set-output name=platform_image_tag::ghcr.io/neuro-inc/base:built-$BASE_IMAGE_TYPE"
- name: Cleanup tool cache
run: |
du -h -d 1 /opt/hostedtoolcache
rm -rf /opt/hostedtoolcache/go /opt/hostedtoolcache/CodeQL
- name: Build image
env:
BASE_IMAGE_TYPE: ${{ steps.get-image-tags.outputs.BASE_IMAGE_TYPE }}
DOCKERFILE: ${{ matrix.dockerfile }}
run: |
make image_build
Expand All @@ -90,9 +104,13 @@ jobs:
- name: Test image
env:
BASE_IMAGE_TYPE: ${{ steps.get-image-tags.outputs.BASE_IMAGE_TYPE }}
TEST_PRESET: gpu-1x3090
run: |
source venv/bin/activate
make e2e_neuro_push
if [[ ${{ matrix.dockerfile }} =~ minimal ]]; then
export TEST_CMD="bash /var/storage/dependencies.minimal.sh";
fi
make test_dependencies
- name: Push release
Expand Down
10 changes: 5 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@ repos:
- id: check-merge-conflict
exclude: "rst$"
- repo: https://github.com/asottile/yesqa
rev: v1.4.0
rev: v1.5.0
hooks:
- id: yesqa
- repo: https://github.com/PyCQA/isort
rev: '5.10.1'
rev: '5.12.0'
hooks:
- id: isort
- repo: https://github.com/psf/black
rev: '22.12.0'
rev: '23.7.0'
hooks:
- id: black
language_version: python3 # Should be a command that runs python3.6+
Expand All @@ -38,11 +38,11 @@ repos:
files: |
.gitignore
- repo: https://github.com/asottile/pyupgrade
rev: 'v3.3.1'
rev: 'v3.10.1'
hooks:
- id: pyupgrade
args: ['--py36-plus']
- repo: https://github.com/PyCQA/flake8
rev: '6.0.0'
rev: '6.1.0'
hooks:
- id: flake8
4 changes: 3 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -56,17 +56,19 @@ RUN APT_INSTALL="apt-get install -y --no-install-recommends" && \
# python
# ------------------------------------------------------------------
COPY requirements/python.txt /tmp/requirements/python.txt
COPY libdevice_fix.sh /tmp/libdevice_fix.sh
# ==================================================================
# Miniconda
# ------------------------------------------------------------------
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh -O ~/miniconda.sh && \
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-py311_23.5.2-0-Linux-x86_64.sh -O ~/miniconda.sh && \
/bin/bash ~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc && \
. /opt/conda/etc/profile.d/conda.sh && \
conda activate base && \
. /tmp/libdevice_fix.sh && \
# ==================================================================
# Python
# ------------------------------------------------------------------
Expand Down
150 changes: 150 additions & 0 deletions Dockerfile.minimal
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
ARG BASE_IMAGE=nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
FROM ${BASE_IMAGE}
ENV LANG C.UTF-8
RUN APT_INSTALL="apt-get install -y --no-install-recommends" && \
apt-get update -qq && \
# ==================================================================
# tools
# ------------------------------------------------------------------
DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
cron \
curl \
git \
libssl-dev \
python3-dev \
python3-pip \
python3-venv \
rsync \
rclone \
unrar \
zip \
unzip \
vim \
wget \
libncurses5-dev \
libncursesw5-dev \
libglib2.0-0 \
gcc \
make \
cmake \
nano \
tmux \
htop \
ssh \
&& \
# NVTop >>
git clone --depth 1 --branch 1.2.2 -q https://github.com/Syllo/nvtop.git nvtop && \
mkdir -p nvtop/build && cd nvtop/build && \
cmake --log-level=WARNING .. && \
make --quiet install && \
cd ../.. && rm -r nvtop && \
# <<
ln -s $(which python3) /usr/bin/python && \
# Git-LFS >>
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash && \
DEBIAN_FRONTEND=noninteractive $APT_INSTALL git-lfs && \
# <<
# Remove PyYAML before other pip tools installation
# since APT installs outdated PyYAML as dist package, which breaks pip's deps management
# https://stackoverflow.com/questions/49911550/how-to-upgrade-disutils-package-pyyaml
rm -rf /usr/lib/python3/dist-packages/yaml && \
rm -rf /usr/lib/python3/dist-packages/PyYAML-* && \
apt-get clean && \
apt-get autoremove -y --purge && \
rm -rf /var/lib/apt/lists/* /tmp/* ~/*
# ==================================================================
# python
# ------------------------------------------------------------------
COPY requirements/python.minimal.txt /tmp/requirements/python.txt
COPY libdevice_fix.sh /tmp/libdevice_fix.sh
# ==================================================================
# Miniconda
# ------------------------------------------------------------------
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-py311_23.5.2-0-Linux-x86_64.sh -O ~/miniconda.sh && \
/bin/bash ~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc && \
. /opt/conda/etc/profile.d/conda.sh && \
conda activate base && \
. /tmp/libdevice_fix.sh && \
# ==================================================================
# Python
# ------------------------------------------------------------------
PIP_INSTALL="python -m pip --no-cache-dir install --upgrade" && \
$PIP_INSTALL pip pipx && \
python3 -m pipx ensurepath && \
$PIP_INSTALL -r /tmp/requirements/python.txt && \
rm -r /tmp/requirements
# ==================================================================
# OOM guard
# Adds a script to tune oom_killer behavior and puts it into the crontab
# ==================================================================
COPY files/usr/local/sbin/oom_guard.sh /usr/local/sbin/oom_guard.sh
RUN crontab -l 2>/dev/null | { cat; echo '* * * * * /usr/local/sbin/oom_guard.sh'; } | crontab

# ==================================================================
# Set up SSH for remote debug
# ------------------------------------------------------------------

# Setup environment for ssh session
RUN apt-get install -y --no-install-recommends openssh-server && \
echo "export PATH=/root/.local/bin:$PATH" >> /etc/profile && \
echo "export LANG=$LANG" >> /etc/profile && \
echo "export LANGUAGE=$LANGUAGE" >> /etc/profile && \
echo "export LC_ALL=$LC_ALL" >> /etc/profile && \
echo "export PYTHONIOENCODING=$PYTHONIOENCODING" >> /etc/profile && \
. /etc/profile && \
apt-get clean && \
apt-get autoremove && \
rm -rf /var/lib/apt/lists/* /tmp/* ~/*

# Create folder for openssh fifos
RUN mkdir -p /var/run/sshd

# Disable password for root
RUN sed -i -re 's/^root:[^:]+:/root::/' /etc/shadow
RUN sed -i -re 's/^root:.*$/root::0:0:System Administrator:\/root:\/bin\/bash/' /etc/passwd

# Permit root login over ssh
RUN echo "Subsystem sftp /usr/lib/sftp-server \n\
PasswordAuthentication yes\n\
ChallengeResponseAuthentication yes\n\
PermitRootLogin yes \n\
PermitEmptyPasswords yes\n" > /etc/ssh/sshd_config

# ssh port
EXPOSE 22

# ==================================================================
# Neu.ro and other isolated via pipx Python packages
# ------------------------------------------------------------------
COPY requirements/pipx.minimal.txt /tmp/requirements/pipx.txt
# Used for pipx
ENV PATH=/opt/conda/bin:/root/.local/bin/:$PATH
RUN cat /tmp/requirements/pipx.txt | xargs -rn 1 pipx install && \
pipx list --json && \
# This is TMP work-around due to https://github.com/neuro-inc/neuro-cli/pull/2671
pipx runpip neuro-all uninstall -y click && \
pipx runpip neuro-all install click==8.1.3 && \
rm -r /tmp/requirements
# ==================================================================
# config
# ------------------------------------------------------------------

RUN ldconfig

EXPOSE 8888 6006

# Force the stdout and stderr streams to be unbuffered.
# Needed for correct work of tqdm via 'neuro exec'
ENV PYTHONUNBUFFERED 1

WORKDIR /project

## Setup entrypoint
COPY entrypoint.sh /entrypoint.sh

RUN chmod +x /entrypoint.sh
ENTRYPOINT ["bash", "/entrypoint.sh"]
8 changes: 5 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ TEST_STORAGE_SUFFIX := $(shell bash -c 'echo $$(date +"%Y-%m-%d--%H-%M-%S")-$$RA
BASE_IMAGE ?= nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
BASE_IMAGE_TYPE ?=

DOCKERFILE ?= Dockerfile

.PHONY: setup
setup:
pip install pre-commit
Expand All @@ -18,7 +20,7 @@ image_build:
docker build \
-t $(TARGET_IMAGE_NAME):built-$(BASE_IMAGE_TYPE) \
--build-arg BASE_IMAGE=${BASE_IMAGE} \
-f Dockerfile .
-f $(DOCKERFILE) .

.PHONY: image_deploy
image_deploy:
Expand All @@ -32,8 +34,8 @@ image_deploy:
e2e_neuro_push:
neuro push $(TARGET_IMAGE_NAME):built-$(BASE_IMAGE_TYPE) $(TEST_IMAGE_NAME):$(BASE_IMAGE_TYPE)

TEST_PRESET=gpu-large
TEST_CMD=bash /var/storage/dependencies.sh
TEST_PRESET ?= gpu-large
TEST_CMD ?= bash /var/storage/dependencies.sh
.PHONY: test_dependencies
test_dependencies:
neuro mkdir -p $(TEST_STORAGE)/$(TEST_STORAGE_SUFFIX)
Expand Down
30 changes: 30 additions & 0 deletions files/testing/dependencies.minimal.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash
set -xev -o pipefail

rsync --version
rclone --version

curl --version
wget --version

zip --version
unzip --help
unrar -V

vim --version
nano --version

tmux -V
ssh -V
git --version
git-lfs --version
nvtop --version

service cron status

neuro --version
neuro-extras --version
neuro-flow --version
neuro config show

nvidia-smi
11 changes: 11 additions & 0 deletions libdevice_fix.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/sh
if ! command -v nvcc >/dev/null 2>&1; then
echo "====== applying libdevice fix ======"
# (A.K.) Need to install nvcc since it provides libdevice.10.bc
# This adds less than 100MB to the image size
# Adapted from https://www.tensorflow.org/install/pip#ubuntu_2204
conda install -y -c nvidia cuda-nvcc=11.3.58 && \
mkdir -p /usr/local/cuda/nvvm/libdevice && \
ln -s /opt/conda/nvvm/libdevice/libdevice.10.bc /usr/local/cuda/nvvm/libdevice/ && \
echo "====== libdevice fix applied ======"
fi
1 change: 1 addition & 0 deletions requirements/pipx.minimal.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
neuro-all==23.7.1
4 changes: 2 additions & 2 deletions requirements/pipx.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
awscli==1.27.13
neuro-all==22.8.1
awscli==1.29.17
neuro-all==23.7.1
Empty file added requirements/python.minimal.txt
Empty file.
28 changes: 14 additions & 14 deletions requirements/python.txt
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
cloudpickle==2.2.0
future==0.18.2
ipywidgets==8.0.3
jupyterlab==3.5.1
matplotlib==3.6.2
mlflow[extras]==2.0.1
opencv-python-headless==4.6.0.66
future==0.18.3
ipywidgets==8.0.4
jupyterlab==3.6.1
matplotlib==3.7.1
mlflow[extras]==2.2.2
opencv-python-headless==4.7.0.72
pandas==1.5.2
Pillow==9.3.0
Pillow==9.4.0
scikit-learn==1.2.0
scipy==1.9.3
tensorboardX==2.5.1
tensorflow-gpu==2.10.1
torch==1.13.0+cu116
torchaudio==0.13.0+cu116
torchvision==0.14.0+cu116
scipy==1.10.1
tensorboardX==2.6.2
tensorflow==2.13.0
torch==2.0.1+cu117
torchaudio==2.0.2+cu117
torchvision==0.15.2+cu117
tqdm==4.64.1
typing==3.7.4.3
wandb[aws]==0.13.6
wandb[aws]==0.13.11

0 comments on commit ace0821

Please sign in to comment.