Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH docker based reusable CI workflows. #993

Merged
merged 20 commits into from
Aug 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
5b9d135
+gh-build.yml +gh-cleanup.yml +Dockerfile +build-cunumeric-all -ci-gh…
sandeepd-nv Jul 20, 2023
e6a8c6b
Use sandeepd-nv/legate.core instead of nv-legate/legate.core until th…
sandeepd-nv Jul 20, 2023
68f1644
Attempt 2. Use sandeepd-nv/legate.core instead of nv-legate/legate.co…
sandeepd-nv Jul 20, 2023
fdb73a2
Removed all 'TODO: undo' except one.
sandeepd-nv Jul 20, 2023
4b340b8
Use the make_ghci_parts_reusable_from_cunumeric branch of sandeepd-nv…
sandeepd-nv Jul 20, 2023
3b75d4b
Show docker version.
sandeepd-nv Jul 21, 2023
070714f
+Docker system prune.
sandeepd-nv Jul 21, 2023
63851e2
See if linux-amd64-cpu4 can be used from sandeepd-nv/cunumeric.
sandeepd-nv Jul 21, 2023
80c7819
USER coder. Fixed spelling mistak.
sandeepd-nv Jul 21, 2023
47f666a
Removed wildcard: .cred[s] -> .creds. This will let us use older vers…
sandeepd-nv Jul 26, 2023
d9eb1fd
Install legate with WAR.
sandeepd-nv Jul 26, 2023
f07cae0
1. Updated cunumeric version to 23.09.00. 2. build-cunumeric-conda no…
sandeepd-nv Jul 26, 2023
ea91f6b
1. Build legate at SHA specified in cmake/versions.json 2. Removed --…
sandeepd-nv Jul 26, 2023
aca77bb
Switch to nv-legate/legate.core from sandeepd-nv/legate.core.
sandeepd-nv Aug 9, 2023
2c17b38
Removed reference to non-existent branch.
sandeepd-nv Aug 9, 2023
7f7f3dc
+.github/workflows/ci-gh-gpu-build-and-test.yml
sandeepd-nv Aug 9, 2023
0923eb6
Simplified the workflow structure.
sandeepd-nv Aug 9, 2023
ae5c4a9
Attempt 2. Simplified the workflow structure.
sandeepd-nv Aug 9, 2023
a7a92e5
Attempt 3. Simplified the workflow structure.
sandeepd-nv Aug 9, 2023
bf0aae9
Minor change.
sandeepd-nv Aug 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .dockerignore
112 changes: 14 additions & 98 deletions .github/workflows/ci-gh.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: Build cunumeric on GH
name: Build and test cunumeric on GH

concurrency:
group: ci-gpu-on-${{ github.event_name }}-from-${{ github.ref_name }}
group: ci-build-and-test-on-${{ github.event_name }}-from-${{ github.ref_name }}
cancel-in-progress: true

on:
Expand All @@ -11,99 +11,15 @@ on:
- "branch-*"

jobs:
build:
permissions:
id-token: write # This is required for configure-aws-credentials
contents: read # This is required for actions/checkout

# Ref: https://docs.rapids.ai/resources/github-actions/#cpu-labels for `linux-amd64-cpu4`
runs-on: ${{ github.repository == 'nv-legate/cunumeric' && 'linux-amd64-cpu4' || 'ubuntu-latest' }}
container:
options: -u root
image: rapidsai/devcontainers:23.06-cpp-cuda11.8-mambaforge-ubuntu22.04
volumes:
- ${{ github.workspace }}/out:/tmp/out
env:
DEFAULT_CONDA_ENV: legate
PYTHONDONTWRITEBYTECODE: 1
SCCACHE_REGION: us-east-2
SCCACHE_BUCKET: rapids-sccache-east
SCCACHE_S3_KEY_PREFIX: legate-cunumeric-dev
GH_TOKEN: "${{ secrets.PERSONAL_ACCESS_TOKEN || secrets.GITHUB_TOKEN }}"
GITHUB_TOKEN: "${{ secrets.PERSONAL_ACCESS_TOKEN || secrets.GITHUB_TOKEN }}"
VAULT_HOST: "${{ secrets.PERSONAL_ACCESS_TOKEN && 'https://vault.ops.k8s.rapids.ai' || '' }}"
VAULT_S3_TTL: "28800s" # 8 hours

steps:
- name: Checkout legate.core
uses: actions/checkout@v3
with:
repository: nv-legate/legate.core
fetch-depth: 0
path: legate

- name: Checkout cunumeric (= this repo)
uses: actions/checkout@v3
with:
fetch-depth: 0
path: cunumeric

- name: Setup
shell: bash -eo pipefail {0}
run: |
export LEGATE_SHA=$(cat cunumeric/cmake/versions.json | jq -r '.packages.legate_core.git_tag')
echo "Checking out LEGATE_SHA: ${LEGATE_SHA}"
git -C legate checkout $LEGATE_SHA

cp -ar legate/continuous_integration/home/coder/.gitconfig /home/coder/;
cp -ar legate/continuous_integration/home/coder/.local /home/coder/;
mv legate /home/coder/legate

cp -ar cunumeric/continuous_integration/home/coder/.local/bin/* /home/coder/.local/bin/;
mv cunumeric /home/coder/cunumeric;

chmod a+x /home/coder/.local/bin/*;
chown -R coder:coder /home/coder/;
chown -R coder:coder /tmp/out;

- if: github.repository == 'nv-legate/cunumeric'
name: Get AWS credentials for sccache bucket
uses: aws-actions/configure-aws-credentials@v2
with:
aws-region: us-east-2
role-duration-seconds: 28800 # 8 hours
role-to-assume: arn:aws:iam::279114543810:role/gha-oidc-nv-legate

- name: Create conda env
shell: su coder {0}
run: cd ~/; exec entrypoint get-yaml-and-make-conda-env;

- name: Build legate.core C++ library
shell: su coder {0}
run: cd ~/; exec entrypoint build-legate-cpp;

- name: Build legate.core Python Wheel
shell: su coder {0}
run: cd ~/; exec entrypoint build-legate-wheel;

- name: Build legate.core Conda Package
shell: su coder {0}
run: cd ~/; exec entrypoint build-legate-conda;

- name: Build cunumeric C++ library
shell: su coder {0}
run: cd ~/; exec entrypoint build-cunumeric-cpp;

- name: Build cunumeric Python Wheel
shell: su coder {0}
run: cd ~/; exec entrypoint build-cunumeric-wheel;

- name: Build cunumeric Conda Package
shell: su coder {0}
run: cd ~/; exec entrypoint build-cunumeric-conda;

- name: Upload build output
uses: actions/upload-artifact@v3
with:
name: "cunumeric-${{ github.sha }}"
path: ./out/*
build-and-test:
strategy:
fail-fast: false
matrix:
include:
- {build-target: cpu}
- {build-target: gpu}
uses:
./.github/workflows/gh-build-and-test.yml
with:
build-target: ${{ matrix.build-target }}
sha: ${{ github.sha }}
32 changes: 32 additions & 0 deletions .github/workflows/gh-build-and-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
on:
workflow_call:
inputs:
build-target:
required: true
type: string
sha:
required: true
type: string

jobs:
build:
name: "Build cunumeric (with ${{ inputs.build-target }} legate) on GH"
uses:
./.github/workflows/gh-build.yml
with:
build-target: ${{ inputs.build-target }}
# Ref: https://docs.rapids.ai/resources/github-actions/#cpu-labels for `linux-amd64-cpu4`
runs-on: ${{ github.repository_owner == 'nv-legate' && 'linux-amd64-cpu4' || 'ubuntu-latest' }}
sha: ${{ inputs.sha }}

cleanup:
needs:
- build

# This ensures the cleanup job runs even if previous jobs fail or the workflow is cancelled.
if: always()
uses:
./.github/workflows/gh-cleanup.yml
with:
build-target: ${{ inputs.build-target }}
sha: ${{ inputs.sha }}
123 changes: 123 additions & 0 deletions .github/workflows/gh-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
name: Build cunumeric on GH

on:
workflow_call:
inputs:
build-target:
required: true
type: string
runs-on:
required: true
type: string
sha:
required: true
type: string

env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
BASE_IMAGE: rapidsai/devcontainers:23.06-cpp-cuda11.8-mambaforge-ubuntu22.04
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like 23.10 is the current version of devcontainer. We can update it here or in a separate PR.

Copy link
Contributor Author

@sandeepd-nv sandeepd-nv Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update it in a separate PR.

IMAGE_NAME_LEGATE: legate.core-${{ inputs.build-target }}
IMAGE_NAME_CUNUMERIC: cunumeric-${{ inputs.build-target }}
USE_CUDA: ${{ (inputs.build-target == 'cpu' && 'OFF') || 'ON' }}

jobs:
build:
name: build-${{ inputs.build-target }}-sub-workflow

permissions:
id-token: write # This is required for configure-aws-credentials
contents: read # This is required for actions/checkout
packages: write # This is required to push docker image to ghcr.io

runs-on: ${{ inputs.runs-on }}

steps:
- name: Checkout legate.core
uses: actions/checkout@v3
with:
repository: nv-legate/legate.core
fetch-depth: 0
path: legate

- name: Checkout cunumeric (= this repo)
uses: actions/checkout@v3
with:
fetch-depth: 0
path: cunumeric

- if: github.repository_owner == 'nv-legate'
name: Get AWS credentials for sccache bucket
uses: aws-actions/configure-aws-credentials@v2
with:
aws-region: us-east-2
role-duration-seconds: 28800 # 8 hours
role-to-assume: arn:aws:iam::279114543810:role/gha-oidc-nv-legate

- name: Docker system prune
run: |
docker version
docker system prune --all --force

- name: Build legate.core using docker build
run: |
echo BUILD_TARGET: ${{ inputs.build-target }}
echo USE_CUDA: ${{ env.USE_CUDA }}

export LEGATE_SHA=$(cat cunumeric/cmake/versions.json | jq -r '.packages.legate_core.git_tag')
echo "Checking out LEGATE_SHA: ${LEGATE_SHA}"
git -C legate checkout $LEGATE_SHA

IMAGE_TAG_LEGATE=${{ env.IMAGE_NAME_LEGATE }}:${{ inputs.sha }}

chmod +x legate/continuous_integration/build-docker-image
legate/continuous_integration/build-docker-image \
--base-image "$BASE_IMAGE" \
--image-tag "$IMAGE_TAG_LEGATE" \
--source-dir legate

- name: Build cunumeric using docker build
run: |
IMAGE_TAG_CUNUMERIC=${{ env.IMAGE_NAME_CUNUMERIC }}:${{ inputs.sha }}
IMAGE_TAG_LEGATE=${{ env.IMAGE_NAME_LEGATE }}:${{ inputs.sha }}

legate/continuous_integration/build-docker-image \
--base-image "$IMAGE_TAG_LEGATE" \
--image-tag "$IMAGE_TAG_CUNUMERIC" \
--source-dir cunumeric

- name: Dump docker history of image before upload
run: |
IMAGE_TAG=${{ env.IMAGE_NAME_CUNUMERIC }}:${{ inputs.sha }}
docker history $IMAGE_TAG

- name: Log in to container image registry
run: echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u $ --password-stdin

- name: Push cunumeric image
run: |
IMAGE_TAG=${{ env.IMAGE_NAME_CUNUMERIC }}:${{ inputs.sha }}

IMAGE_ID=ghcr.io/${{ github.repository_owner }}

# Change all uppercase to lowercase
IMAGE_ID=$(echo $IMAGE_ID | tr '[A-Z]' '[a-z]')

IMAGE_ID=$IMAGE_ID/$IMAGE_TAG

docker tag $IMAGE_TAG $IMAGE_ID
docker push $IMAGE_ID

- name: Copy artifacts back to the host
run: |
IMAGE_TAG=${{ env.IMAGE_NAME_CUNUMERIC }}:${{ inputs.sha }}
mkdir -p artifacts
docker run -v "$(pwd)/artifacts:/home/coder/.artifacts" --rm -t $IMAGE_TAG copy-artifacts

- name: Display structure of workdir
run: ls -R

- name: Upload build artifacts
uses: actions/upload-artifact@v3
with:
name: "cunumeric-${{ inputs.build-target }}-${{ inputs.sha }}"
path: artifacts
43 changes: 43 additions & 0 deletions .github/workflows/gh-cleanup.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Clean up

on:
workflow_call:
inputs:
build-target:
required: true
type: string
sha:
required: true
type: string

env:
IMAGE_NAME: cunumeric-${{ inputs.build-target }}

jobs:
cleanup:
permissions:
packages: write

runs-on: ubuntu-latest

steps:
- name: Delete docker image
run: |
set -xeuo pipefail

PACKAGE_NAME=${{ env.IMAGE_NAME }}
PACKAGE_VERSION_ID=$(
curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ github.token }}"\
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/orgs/${{ github.repository_owner }}/packages/container/$PACKAGE_NAME/versions |
jq '.[] | select(.metadata.container.tags[] == "${{ inputs.sha }}") | .id' -
)

curl -L \
-X DELETE \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ github.token }}"\
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/orgs/${{ github.repository_owner }}/packages/container/$PACKAGE_NAME/versions/$PACKAGE_VERSION_ID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure I understand what is going on. So in the build step, we build the image and push it. Is this the same image we delete in this step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same image we delete in this step?

Yes. We don't need the image if all the tests succeed. After I have submitted #1022 if some test does not succeed you will be able to download the image and reproduce the problem locally. In addition to this I will create a separate CI job to delete any unused images after a certain period of time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. How is this job prevented from running if the tests fail? I see that the cleanup call says always. I may be missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed always in #1022.

2 changes: 1 addition & 1 deletion cmake/versions.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"git_url" : "https://github.com/nv-legate/legate.core.git",
"git_shallow": false,
"always_download": false,
"git_tag" : "a405f595603238c8557cb5fefd3981d190a2fb1d"
"git_tag" : "4b79075eb5d7035d501c334c87a87939af79abc2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something in legate that needs to be updated for this?

Copy link
Contributor Author

@sandeepd-nv sandeepd-nv Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we do need a later version of legate.core than what is currently specified in versions.json.: https://github.com/nv-legate/cunumeric/pull/993/checks#step:7:50. To be more specific we need the file legate/continuous_integration/build-docker-image.

The selected SHA does not directly point to the specific change which introduced the aforementioned dependency but it advances legate.core to the point where we know we have a successful build using build-docker-image. Hope that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that makes sense. Please proceed.

}
}
}
44 changes: 44 additions & 0 deletions continuous_integration/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
ARG BASE_IMAGE
FROM ${BASE_IMAGE} as stage0

COPY --chown=coder:coder continuous_integration/home/coder/.local/bin/* /home/coder/.local/bin/
COPY --chown=coder:coder . /home/coder/cunumeric

RUN chmod a+x /home/coder/.local/bin/*

#---------------------------------------------------
FROM stage0 as setup

USER coder
WORKDIR /home/coder

RUN set -x && . conda-utils && \
get_yaml_and_make_conda_env && \
install_legate_core_with_war

#---------------------------------------------------
FROM setup as build
USER coder
WORKDIR /home/coder

ARG GITHUB_TOKEN
ENV GITHUB_TOKEN=${GITHUB_TOKEN}
ARG AWS_SESSION_TOKEN
ENV AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN}
ARG AWS_ACCESS_KEY_ID
ENV AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
ARG AWS_SECRET_ACCESS_KEY
ENV AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}

COPY --chown=coder:coder .creds /run/secrets

RUN entrypoint build-cunumeric-all

#---------------------------------------------------
FROM stage0 as final
USER coder
WORKDIR /home/coder

COPY --from=build --chown=coder:coder /tmp/out /tmp/out
COPY --from=build --chown=coder:coder /tmp/conda-build /tmp/conda-build
COPY --from=build --chown=coder:coder /tmp/env_yaml /tmp/env_yaml
17 changes: 17 additions & 0 deletions continuous_integration/home/coder/.local/bin/build-cunumeric-all
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/usr/bin/env bash


build_cunumeric_all() {
set -x
cd ~/;

conda info

set -euo pipefail;

build-cunumeric-cpp;
build-cunumeric-wheel;
build-cunumeric-conda;
}

(build_cunumeric_all "$@");
Loading