Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

big new image restructure #7

Merged
merged 68 commits into from
Jul 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
0bb4bff
WIP
shouples Jun 27, 2023
6e236ec
next round of WIP
shouples Jun 28, 2023
6ba5d64
removing micromamba so we actually use the right python version
shouples Jun 28, 2023
524a049
update python base images and reqs
shouples Jun 28, 2023
c803a5c
remove mamba
shouples Jun 28, 2023
df2033c
gitignore secrets_helper.py in build directories, add convenience tasks
shouples Jun 28, 2023
50d9b87
Delete apt-install
shouples Jun 28, 2023
e39b035
Delete run.sh
shouples Jun 28, 2023
27ecb67
saving progress here since py3.x base/DS and py3.9 noteable builds ar…
shouples Jun 28, 2023
a51c2dc
more WIP
shouples Jun 29, 2023
b0b430b
need these for ipython_config.py in noteable builds
shouples Jun 30, 2023
680fffa
fix ordering
shouples Jun 30, 2023
f5e9a47
updates for GPU builds
shouples Jun 30, 2023
4713c69
remove echo
shouples Jun 30, 2023
1a58455
remove copies
shouples Jun 30, 2023
1f21ee8
hadolint updates + swap secrets_helper.py for secrets_helper.sh
shouples Jul 5, 2023
7a6e4a9
remove duplicate lines since these aren't actual base images
shouples Jul 5, 2023
727d7b2
hadolint
shouples Jul 5, 2023
b4750a6
add R:noteable build for data frame -> DEX output
shouples Jul 5, 2023
128e33a
remove test tasks
shouples Jul 5, 2023
2e824b3
add cmake to allow package installations
shouples Jul 5, 2023
89bbf75
how did this even work before
shouples Jul 6, 2023
29717c7
more cleanup
shouples Jul 6, 2023
f68717e
add connection file back in
shouples Jul 6, 2023
006ba47
fix for when no secrets are present
shouples Jul 6, 2023
8bbf299
remove datascience-gpu in favor of gpu.* prefixed requirements files
shouples Jul 7, 2023
ebd18a9
remove noteable-gpu in favor of gpu.* prefixed requirements
shouples Jul 7, 2023
664b59a
more gpu updates
shouples Jul 7, 2023
7e81882
fix for matrix rendering
shouples Jul 7, 2023
983d7fb
use optional build targets
shouples Jul 7, 2023
4c7ef42
fix lookup
shouples Jul 7, 2023
521dcc1
fix tasks
shouples Jul 7, 2023
f28e4b4
update comments
shouples Jul 7, 2023
ed521c7
add cmds to ensure parent images are built
shouples Jul 7, 2023
b04fda8
updates for gpu.requirements.in
shouples Jul 7, 2023
736870d
revert
shouples Jul 7, 2023
ad595c0
remove build actions for now
shouples Jul 10, 2023
5dcd6c5
move into `/scripts`
shouples Jul 10, 2023
407fa9b
update `HOME`
shouples Jul 10, 2023
8fd5e4b
swap /home/noteable to /srv/noteable
shouples Jul 10, 2023
5c2ebf5
set home dir to /srv/noteable on user creation
shouples Jul 10, 2023
541ff71
move gitignores to top level
shouples Jul 10, 2023
6211b09
linting
shouples Jul 10, 2023
6f1eed7
newline
shouples Jul 10, 2023
f2a61ec
newline
shouples Jul 10, 2023
2da8575
punt to a follow-on PR
shouples Jul 10, 2023
1d42440
add missing requirements.txt files
shouples Jul 10, 2023
6045e77
hadolint ignore
shouples Jul 10, 2023
8e1ca0b
hadolint ignore
shouples Jul 10, 2023
ce57cf4
hadolint ignore
shouples Jul 10, 2023
48ea248
linting
shouples Jul 11, 2023
2102967
remove extra run.sh copies
shouples Jul 11, 2023
dae0a0d
hadolint
shouples Jul 11, 2023
592986d
I suppose it would help if I targeted the right linter
shouples Jul 11, 2023
e76bd7d
shellcheck fix
shouples Jul 11, 2023
6384ec9
hadolint
shouples Jul 11, 2023
3664480
hadolint
shouples Jul 11, 2023
b63b365
hadolint
shouples Jul 11, 2023
2da038f
remove py3.11 noteable tasks for now
shouples Jul 11, 2023
cd8b16e
something weird happening here with WORKDIR /tmp
shouples Jul 11, 2023
7b404ae
update comment
shouples Jul 11, 2023
0404150
add py 3.11 gpu.requirements.txt
shouples Jul 11, 2023
e317783
fix noteable gpu tasks, add extra convenience tasks
shouples Jul 11, 2023
ee7a698
add missing gpu.requirements.txt
shouples Jul 11, 2023
c68b125
add gpu stages
shouples Jul 11, 2023
5eb3de1
updates we forgot to move over from the 3.9 build
shouples Jul 11, 2023
e419eb9
don't overwrite `datascience` and `noteable` builds
shouples Jul 11, 2023
0d911fc
Merge branch 'main' into djs/kernels-overhaul
shouples Jul 11, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 0 additions & 27 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,30 +28,3 @@ jobs:
VALIDATE_BASH_EXEC: true
VALIDATE_DOCKERFILE_HADOLINT: true
VALIDATE_YAML: true

# build_python_kernel:
# permissions:
# id-token: write
# contents: read
# packages: write
# actions: write
# uses: ./.github/workflows/reusable-docker-build.yml
# strategy:
# matrix:
# # Must be a supported version by jupyter/datascience-notebook
# # https://hub.docker.com/r/jupyter/datascience-notebook/tags?page=1&name=python-
# version: [ "3.9.13", "3.8.13" ]
# secrets: inherit
# with:
# dockerfile: ./kernels/python/Dockerfile
# context: ./kernels/python
# images: |
# ghcr.io/${{ github.repository }}/python
# tags: |
# type=ref,event=branch,prefix=${{ matrix.version }}
# type=ref,event=pr,prefix=${{ matrix.version }}
# type=sha,format=long,prefix=${{ matrix.version }}
# type=raw,value=latest,enable=${{ github.ref == format('refs/heads/{0}', 'main') }},prefix=${{ matrix.version }}
# build_args: |
# PYTHON_VERSION=${{ matrix.version }}
# platforms: "linux/amd64"
46 changes: 46 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# ignore these everywhere
.pythonrc
.Rprofile
apt-install
Aptfile
environment.txt
git_credential_helper.py
git-wrapper.sh
gpu.Aptfile
gpu.requirements.in
initial-condarc
ipython_config.py
secrets_helper.sh
requirements.in
requirements.R
run.sh

# ...except for these places where we care about changes happening
# (NOTE: this is because the tasks should copy the files down into the build directories)
!scripts/apt-install
!scripts/secrets_helper.sh

!python/base/Aptfile
!python/datascience/Aptfile
!python/noteable/Aptfile

!python/base-gpu/gpu.Aptfile
!python/base-gpu/environment.txt

!python/base-gpu/initial-condarc

!python/base/requirements.in
!python/datascience/requirements.in
!python/noteable/requirements.in

!python/run.sh
!python/base-gpu/run.sh
!r/run.sh

!python/noteable/.pythonrc
!python/noteable/ipython_config.py
!python/noteable/git_credential_helper.py
!python/noteable/git-wrapper.sh

!r/noteable/.Rprofile
!r/noteable/requirements.R
6 changes: 0 additions & 6 deletions Makefile

This file was deleted.

13 changes: 13 additions & 0 deletions R/Aptfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
build-essential
ca-certificates
cmake
curl
bzip2
gnupg2
wget
g++
git
jq
libudunits2-dev
procps
unixodbc-dev
76 changes: 76 additions & 0 deletions R/base/4.3.0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# syntax = docker/dockerfile:1.2.1
# ---
# Bare minimum R 4.3.x image with IRkernel installed
# - no R packages aside from builtins and IRkernel
# - no git, secrets, SQL, extensions, etc
# ---
ARG NBL_R_VERSION=4.3.0
FROM r-base:${NBL_R_VERSION}

# User/group setup
USER root

ENV NB_USER="noteable" \
NB_UID=4004 \
NB_GID=4004

RUN groupadd --gid 4004 noteable && \
useradd --uid 4004 \
--shell /bin/false \
--create-home \
--no-log-init \
--gid noteable noteable \
--home-dir /srv/noteable && \
chown --recursive noteable:noteable /srv/noteable && \
mkdir -p /etc/noteable && chown noteable:noteable /etc/noteable

# Install tini to manage passing signals to the child kernel process
ENV TINI_VERSION v0.19.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini

# Use micromamba and set up a virtual environment so we can install packages without root
COPY apt-install /usr/bin/
# hadolint ignore=DL3045
COPY Aptfile .
RUN /usr/bin/apt-install Aptfile

SHELL ["/bin/bash", "-o", "pipefail", "-c"]
RUN wget -qO- https://micromamba.snakepit.net/api/micromamba/linux-64/latest | tar -xvj bin/micromamba && \
./bin/micromamba shell init -s bash -p ~/micromamba

USER noteable
RUN micromamba create --name noteable-venv \
-c conda-forge \
-y \
r="${NBL_R_VERSION}"
# make subsequent RUN commands use the virtualenv:
SHELL ["micromamba", "run", "-n", "noteable-venv", "/bin/bash", "-c"]

# hadolint ignore=SC2239
RUN R -e "install.packages('IRkernel', repos='http://cran.us.r-project.org')"

COPY secrets_helper.sh /tmp/secrets_helper.sh
COPY run.sh /usr/local/bin
shouples marked this conversation as resolved.
Show resolved Hide resolved

ENV HOME="/srv/noteable" \
XDG_CACHE_HOME="/srv/noteable/.cache/" \
GOOGLE_APPLICATION_CREDENTIALS="/vault/secrets/gcp-credentials"

WORKDIR /etc/noteable/project
EXPOSE 50001-50005

ENTRYPOINT ["/tini", "-g", "--"]
CMD ["run.sh"]

ARG NBL_ARG_BUILD_TIMESTAMP="undefined"
ARG NBL_ARG_REVISION="undefined"
ARG NBL_ARG_BUILD_URL="undefined"
ARG NBL_ARG_VERSION="undefined"
LABEL org.opencontainers.image.created="${NBL_ARG_BUILD_TIMESTAMP}" \
org.opencontainers.image.revision="${NBL_ARG_REVISION}" \
org.opencontainers.image.source="https://github.com/noteable-io/polymorph" \
org.opencontainers.image.title="noteable-R-${NBL_R_VERSION}" \
org.opencontainers.image.url="${NBL_ARG_BUILD_URL}" \
org.opencontainers.image.vendor="Noteable" \
org.opencontainers.image.version="${NBL_ARG_VERSION}"
99 changes: 99 additions & 0 deletions R/noteable/.Rprofile
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
library(IRdisplay)
library(repr)
library(reticulate)

prepare_dex_content <- function(df) {
#'
#' Create schema and data structure for data frame to be rendered by DEX
#'

# create a schema for a dataframe, which DEX uses to determine column dtypes.
# R data frames don't have this functionality, so we have to use reticulate
# to call into the python pandas library
pandas <- import("pandas")

# If df is a matrix, convert it to a data frame
if (is.matrix(df)) {
# In R, a matrix is a 2D vector, not a data frame. When reticulate converts an R matrix to Python,
# it becomes a numpy array, not a pandas DataFrame. The pandas function we're using requires a DataFrame,
# so we need to convert the matrix to a data frame first.
#
# We use stringsAsFactors = FALSE to prevent R from converting strings to factors. This is a feature of R
# that can be confusing for people used to Python, where there's no direct equivalent of factors.
#
# We use row.names = FALSE to prevent R from using the first column of the data as row names. This is
# because R matrices don't have row names in the same way that data frames do, and we want to keep the
# structure of the data consistent when we convert it to a DataFrame.
df <- as.data.frame(df, stringsAsFactors = FALSE, row.names = FALSE)
}
df_py <- r_to_py(df)
schema <- pandas$io$json$build_table_schema(df_py, index=FALSE)

# vectorized format (list of lists)
#data = as.matrix.data.frame(t(df))
# pandas df.to_dict("records") format
data = as.data.frame.list(df)

list(
schema = schema,
data = data
)
}

prepare_dex_metadata <- function(df) {
#'
#' Create metadata for data frame to be rendered by DEX
#'
list(
default_index_used=TRUE,
dataframe_info = list(
orig_num_rows = dim(df)[0],
orig_num_cols = dim(df)[1]
)
)
}

repr_dex <- function(obj, ...) {
if (is(obj, "data.frame") || is(obj, "matrix")) {
data <- prepare_dex_content(obj)
metadata <- prepare_dex_metadata(obj)
bundle_data <- list("application/vnd.dataresource+json"=data)
bundle_metadata <- list("application/vnd.dataresource+json"=metadata)
# we could use publish_mimebundle() to provide the data and metadata,
# but that doesn't return anything, which triggers repr_html/repr_markdown, etc
#publish_mimebundle(bundle_data, metadata=bundle_metadata)
return(data)
} else {
# if it's not a matrix or data.frame, return NULL to let other repr_* functions handle it.
return(NULL)
}
}

enable_dex_formatter <- function() {
# Add custom display formatter to newly added mimetype
IRkernel:::replace_in_package('repr', 'mime2repr', c(repr::mime2repr, list(`application/vnd.dataresource+json` = repr_dex)))

# Add dataresource mimetype to list of recognized mimetypes
mimetypes <- c(getOption('jupyter.display_mimetypes'), "application/vnd.dataresource+json")
options(jupyter.display_mimetypes = mimetypes)

# Register custom formatter for matrix and data.frame
registerS3method("repr_html", "matrix", repr_dex)
registerS3method("repr_html", "data.frame", repr_dex)
}

disable_dex_formatter <- function() {
# Remove custom display formatter
IRkernel:::replace_in_package('repr', 'mime2repr', repr::mime2repr)

# Remove dataresource mimetype from list of recognized mimetypes
mimetypes <- setdiff(getOption('jupyter.display_mimetypes'), "application/vnd.dataresource+json")
options(jupyter.display_mimetypes = mimetypes)

# Reset the formatter for matrix and data.frame to the default
registerS3method("repr_html", "matrix", repr:::repr_html.matrix)
registerS3method("repr_html", "data.frame", repr:::repr_html.data.frame)
}

# enable by default
enable_dex_formatter()
18 changes: 18 additions & 0 deletions R/noteable/4.3.0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# syntax = docker/dockerfile:1.2.1
# Noteable build: adds packages to enable Noteable-specific functionality:
# - DEX support (via .Rprofile)
ARG BASE_IMAGE
# hadolint ignore=DL3006
FROM ${BASE_IMAGE} as base

USER noteable

# Install python to use with Reticulate
RUN micromamba install python=3.9 -y -c conda-forge

# R package dependencies and py_install
COPY requirements.R /tmp/requirements.R
RUN R -e "source('/tmp/requirements.R')"

# similarly, copy any R commands that need to run on startup
COPY .Rprofile /srv/noteable/.Rprofile
5 changes: 5 additions & 0 deletions R/noteable/requirements.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
install.packages('reticulate', repos='http://cran.us.r-project.org')
library(reticulate)
# Python packages to be used in R via reticulate
# ref: https://rstudio.github.io/reticulate/articles/python_packages.html
py_install('pandas==1.5.3', pip=TRUE)
23 changes: 23 additions & 0 deletions R/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env bash
set -o pipefail
set -o nounset
set -o errexit

echo "Local time: $(date)"

set -x

connection_file=/tmp/connection_file.json
cp /etc/noteable/connections/connection_file.json ${connection_file}

# Inject Secrets into environment (see script docstring for more info)
# set +x to avoid echoing the Secrets in plaintext to logs
set +x
echo "Injecting Secrets into environment, echoing is turned off"
# shellcheck disable=SC1091
source /tmp/secrets_helper.sh
echo "Done injecting Secrets, turning echoing back on"
set -x

echo "Starting R kernel"
micromamba run -n noteable-venv R --slave -e "IRkernel::main()" --args ${connection_file}
38 changes: 38 additions & 0 deletions Taskfile.R.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
version: 3

# https://hub.docker.com/_/r-base/tags
vars:
NBL_R_VERSION: 4.3.0
IDENTIFIER: base

# NOTE: When using `deps: []`, variables are inherited from the current task, but when calling them
# directly in `cmds: []`, the variables have to be passed in explicitly.

tasks:
core:build:
desc: Build the R 4.x image
cmds:
- >-
docker build R/{{.IDENTIFIER}}/{{.NBL_R_VERSION}} \
--build-arg "NBL_R_VERSION={{.NBL_R_VERSION}}" \
--build-arg "BASE_IMAGE={{.BASE_IMAGE}}" \
--tag "local/kernel-r-{{.NBL_R_VERSION}}-{{.IDENTIFIER}}:dev"

base:copy-files:
desc: Copy files from the R directory to the build directories
cmds:
- task copy-root-files LANGUAGE=R IDENTIFIER={{.IDENTIFIER}} NBL_LANGUAGE_VERSION={{.NBL_R_VERSION}}
- task copy-language-files LANGUAGE=R IDENTIFIER={{.IDENTIFIER}} NBL_LANGUAGE_VERSION={{.NBL_R_VERSION}}

base:build:
desc: Build the R 4.x base image after copying required files
cmds:
- task r:base:copy-files IDENTIFIER=base NBL_LANGUAGE_VERSION={{.NBL_R_VERSION}}
- task r:core:build IDENTIFIER=base NBL_R_VERSION={{.NBL_R_VERSION}}

noteable:build:
desc: Build the R 4.3.x image with data frame -> DEX support
cmds:
- cp R/noteable/.Rprofile R/noteable/{{.NBL_R_VERSION}}/.Rprofile
- cp R/noteable/requirements.R R/noteable/{{.NBL_R_VERSION}}/requirements.R
- task r:core:build IDENTIFIER=noteable NBL_R_VERSION={{.NBL_R_VERSION}} BASE_IMAGE=local/kernel-r-{{.NBL_R_VERSION}}-base:dev
Loading