Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reproducible rosetta build] Adds support for building rosetta with local patches and an already generated patch dir #354

Closed
wants to merge 119 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
abb6f97
wip
yhtang Oct 23, 2023
068aab9
fix typo
yhtang Oct 23, 2023
fb7cf0b
wip
yhtang Oct 24, 2023
4b0c406
wip
yhtang Oct 24, 2023
4044450
wip
yhtang Oct 24, 2023
38da8a1
wip
yhtang Oct 24, 2023
9ad94c8
wip
yhtang Oct 24, 2023
1aac03a
use full clone
yhtang Oct 25, 2023
42ec9bb
update pip-tools script
yhtang Oct 25, 2023
2f02023
update pip-tools script
yhtang Oct 25, 2023
ad12e78
update pip-tools script
yhtang Oct 25, 2023
bdc34c8
fix t5x dockerfile
yhtang Oct 25, 2023
a5c478e
fix t5x dockerfile
yhtang Oct 25, 2023
b4fd2d8
test flax hack
yhtang Oct 25, 2023
fe6e3d8
flax hack
yhtang Oct 25, 2023
ad83f21
hack for git top of tree Flax dependency
yhtang Nov 2, 2023
b7e1a6c
update URL req
yhtang Nov 3, 2023
0dd9617
update
yhtang Nov 3, 2023
dc44fe4
editability
yhtang Nov 3, 2023
acf2acf
editability
yhtang Nov 3, 2023
027cd63
editability
yhtang Nov 3, 2023
7c10ffa
editability
yhtang Nov 3, 2023
f1046d2
editability
yhtang Nov 3, 2023
bbf2c21
wip
yhtang Nov 3, 2023
cd0d5d1
wip
yhtang Nov 6, 2023
e973c9a
wip
yhtang Nov 6, 2023
1f6d4f1
fix shell
yhtang Nov 6, 2023
6d9ae00
fix arg order
yhtang Nov 6, 2023
cc66ce6
remove standalone TE build
yhtang Nov 6, 2023
fe8708a
build TE wheel in JAX
yhtang Nov 6, 2023
b1e332e
pax wip
yhtang Nov 6, 2023
62ca85b
add pax build
yhtang Nov 6, 2023
bc6c702
add pax build
yhtang Nov 6, 2023
ff6ec2a
fix CI
yhtang Nov 6, 2023
67df9b8
debug pax build
yhtang Nov 6, 2023
d67966a
debug pax build
yhtang Nov 6, 2023
1ba1bff
debug pax build
yhtang Nov 6, 2023
e8f87d2
fix EOF
yhtang Nov 6, 2023
24b4931
redesign workflow
yhtang Nov 7, 2023
e2c34b4
fix job step id
yhtang Nov 7, 2023
7ee441b
arm64 build
yhtang Nov 7, 2023
4f4d909
arm64 build
yhtang Nov 7, 2023
b777244
arm64 build
yhtang Nov 8, 2023
c4c22af
arm64 build
yhtang Nov 8, 2023
736246a
add sitrep to base build
yhtang Nov 8, 2023
bc4b6db
lingvo
yhtang Nov 8, 2023
ce1cf94
lingvo
yhtang Nov 8, 2023
9ac7367
lingvo
yhtang Nov 8, 2023
44d3026
refactor pax arm64 build
yhtang Nov 8, 2023
5c47fcb
refactor pax arm64 build wip
yhtang Nov 8, 2023
139e539
refactor pax arm64 build wip
yhtang Nov 8, 2023
5e2a5ad
refactor pax arm64 build wip
yhtang Nov 8, 2023
cb99d71
refactor pax arm64 build wip
yhtang Nov 9, 2023
0959264
pax arm64
yhtang Nov 9, 2023
314db99
redesign CI
yhtang Nov 9, 2023
8943e9f
redesign CI
yhtang Nov 9, 2023
94115ba
refactor CI
yhtang Nov 9, 2023
ee11851
refactor CI
yhtang Nov 9, 2023
6b6fc92
refactor CI
yhtang Nov 9, 2023
cf66cfd
refactor CI
yhtang Nov 9, 2023
9fd4503
refactor CI
yhtang Nov 9, 2023
f2e80a1
file permission
yhtang Nov 9, 2023
7db2f84
refactor CI
yhtang Nov 9, 2023
7090349
refactor CI
yhtang Nov 9, 2023
6900d86
refactor CI
yhtang Nov 9, 2023
06327c1
refactor CI
yhtang Nov 9, 2023
618a3f5
refactor CI
yhtang Nov 9, 2023
c659d3c
refactor CI
yhtang Nov 9, 2023
8395604
refactor CI
yhtang Nov 9, 2023
2ab0cc9
refactor CI
yhtang Nov 9, 2023
69c17fc
refactor CI
yhtang Nov 9, 2023
d9400d3
refactor CI
yhtang Nov 9, 2023
33dc9ac
refactor CI
yhtang Nov 9, 2023
4fc18dc
fix output tag order
yhtang Nov 9, 2023
b806a5a
t5x arm64 build not ready yet
yhtang Nov 9, 2023
8391f74
nightly T5X build
yhtang Nov 9, 2023
984a19a
nightly T5X build
yhtang Nov 9, 2023
715f62c
fix TE/T5X bug
yhtang Nov 9, 2023
7f06cb8
add TE examples and tests to wheel
yhtang Nov 9, 2023
92b6d0a
allow TE parallel build
yhtang Nov 9, 2023
c4f5b84
jax publish
yhtang Nov 9, 2023
bcfd0e4
rename staging to mealkit
yhtang Nov 9, 2023
12a2fd6
fix nightly
yhtang Nov 9, 2023
1925b14
fix nightly
yhtang Nov 9, 2023
6e71c64
bug fix
yhtang Nov 10, 2023
adb10da
bug fix
yhtang Nov 10, 2023
b956b30
bug fix
yhtang Nov 10, 2023
c727607
fix
yhtang Nov 10, 2023
d450ceb
fix TE test
yhtang Nov 10, 2023
fc2c6e6
fix pax test
yhtang Nov 10, 2023
f9c6cd3
fix TE test
yhtang Nov 10, 2023
7230589
merge CI yaml
yhtang Nov 10, 2023
429ae4d
fix arg
yhtang Nov 10, 2023
de32e4f
rerun TE/PAX test
yhtang Nov 10, 2023
43a57c6
Merge branch 'main' of github.com:NVIDIA/JAX-Toolbox into add-pip-com…
yhtang Nov 10, 2023
ab73f6b
fix TE multi-device test
yhtang Nov 10, 2023
9eb97e8
fix lzma build issue
yhtang Nov 10, 2023
772f606
edit TE test name
yhtang Nov 11, 2023
fcb29b4
fix TE arm64 test install error
yhtang Nov 13, 2023
22d400b
remove --install option from get-source.sh
yhtang Nov 13, 2023
e9f074f
fix TE arm64 test install error
yhtang Nov 13, 2023
602002f
disable sandbox
yhtang Nov 13, 2023
12a57eb
i'm jet-lagged
yhtang Nov 13, 2023
dbaba5b
use Pax image for TE testing
yhtang Nov 13, 2023
ccafb52
Fix job dependency
yhtang Nov 13, 2023
279c388
Adds support for building rosetta with local patches and an already
terrykong Oct 31, 2023
2b5ee7c
comment
terrykong Nov 1, 2023
978f3d0
Add steps to archive patches in run
terrykong Nov 1, 2023
b784727
Date the patches for readability
terrykong Nov 1, 2023
cc16e92
Better log msg
terrykong Nov 1, 2023
7cc4abb
switch to --3way since that produces a merge conflict to help understand
terrykong Nov 1, 2023
4a3bd4a
Switch to mealkit+finalize mechanic for rosetta builds
terrykong Nov 13, 2023
f6a446e
Add github.run_id to artifacts for provenance
terrykong Nov 13, 2023
7fffabf
Update all rosetta workflows with mealkit/final mechanism
terrykong Nov 14, 2023
71d960c
Merge branch 'main' into rosetta-reproducible-patches
terrykong Nov 22, 2023
7bf3790
improve comments
terrykong Nov 22, 2023
2a2ca94
revert sandbox
terrykong Nov 22, 2023
e977aa2
merge issues, fixed manually
terrykong Nov 22, 2023
8c23902
Merge branch 'main' into rosetta-reproducible-patches
terrykong Nov 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/_build_pax.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -163,19 +163,19 @@ jobs:
# bring in utility functions
source .github/workflows/scripts/to_json.sh

badge_label='PAX ${{ inputs.ARCHITECTURE }} build'
badge_label='Upstream PAX ${{ inputs.ARCHITECTURE }} build'
tags="${{ steps.final-metadata.outputs.tags }}"
digest="${{ steps.final-build.outputs.digest }}"
outcome="${{ steps.final-build.outcome }}"

if [[ ${outcome} == "success" ]]; then
badge_message="pass"
badge_color=brightgreen
summary="PAX build on ${{ inputs.ARCHITECTURE }}: $badge_message"
summary="Upstream PAX build on ${{ inputs.ARCHITECTURE }}: $badge_message"
else
badge_message="fail"
badge_color=red
summary="PAX build on ${{ inputs.ARCHITECTURE }}: $badge_message"
summary="Upstream PAX build on ${{ inputs.ARCHITECTURE }}: $badge_message"
fi

to_json \
Expand Down
69 changes: 63 additions & 6 deletions .github/workflows/_build_rosetta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,20 @@ on:
description: 'Build date in YYYY-MM-DD format'
required: false
default: 'NOT SPECIFIED'
ARTIFACT_NAME:
type: string
description: 'Name of the artifact zip file'
required: false
default: 'artifact-rosetta-build'
BADGE_FILENAME:
type: string
description: 'Name of the endpoint JSON file for shields.io badge'
description: 'Name of the endpoint JSON file for shields.io badge (w/o .json || arch || library)'
required: false
default: 'badge-rosetta-build'
default: 'badge-rosetta-build'
outputs:
DOCKER_TAG_MEALKIT:
description: 'Tags of the mealkit image build'
value: $ {{ jobs.build-rosetta.outputs.DOCKER_TAG_MEALKIT }}
description: "Tags of the 'mealkit' image built"
value: ${{ jobs.build-rosetta.outputs.DOCKER_TAG_MEALKIT }}
DOCKER_TAG_FINAL:
description: "Tags of the complete image built"
value: ${{ jobs.build-rosetta.outputs.DOCKER_TAG_FINAL }}
Expand All @@ -48,7 +53,8 @@ jobs:
build-rosetta:
runs-on: [self-hosted, "${{ inputs.ARCHITECTURE }}", small]
env:
BADGE_FILENAME_FULL: ${{ inputs.BADGE_FILENAME}}-${{ inputs.ARCHITECTURE}}.json
BADGE_FILENAME_FULL: ${{ inputs.BADGE_FILENAME }}-${{ inputs.BASE_LIBRARY }}-${{ inputs.ARCHITECTURE }}.json
ARTIFACT_NAME_FULL: ${{ inputs.ARTIFACT_NAME }}-${{ inputs.BASE_LIBRARY }}-${{ inputs.ARCHITECTURE }}
outputs:
DOCKER_TAG_MEALKIT: ${{ steps.mealkit-metadata.outputs.tags }}
DOCKER_TAG_FINAL: ${{ steps.final-metadata.outputs.tags }}
Expand Down Expand Up @@ -124,4 +130,55 @@ jobs:
labels: ${{ steps.final-metadata.outputs.labels }}
target: final
build-args: |
BASE_IMAGE=${{ steps.defaults.outputs.BASE_IMAGE }}
BASE_IMAGE=${{ steps.defaults.outputs.BASE_IMAGE }}

- name: Extract patches
run: rosetta/scripts/extract-patches.sh ${{ steps.final-metadata.outputs.tags }}

- name: Archive generated patches
uses: actions/upload-artifact@v3
with:
name: patches-${{ inputs.BASE_LIBRARY }}-${{ github.run_id }}-${{ inputs.BUILD_DATE }}-${{ inputs.ARCHITECTURE }}
path: rosetta/patches

- name: Generate sitrep
if: success() || failure()
shell: bash -x -e {0}
run: |
# bring in utility functions
source .github/workflows/scripts/to_json.sh

badge_label='${{ inputs.BASE_LIBRARY }} ${{ inputs.ARCHITECTURE }} build'
tags="${{ steps.final-metadata.outputs.tags }}"
digest="${{ steps.final-build.outputs.digest }}"
outcome="${{ steps.final-build.outcome }}"

if [[ ${outcome} == "success" ]]; then
badge_message="pass"
badge_color=brightgreen
summary="${{ inputs.BASE_LIBRARY }} build on ${{ inputs.ARCHITECTURE }}: $badge_message"
else
badge_message="fail"
badge_color=red
summary="${{ inputs.BASE_LIBRARY }} build on ${{ inputs.ARCHITECTURE }}: $badge_message"
fi

to_json \
summary \
badge_label tags digest outcome \
> sitrep.json

schemaVersion=1 \
label="${badge_label}" \
message="${badge_message}" \
color="${badge_color}" \
to_json schemaVersion label message color \
> ${{ env.BADGE_FILENAME_FULL }}

- name: Upload sitrep and badge
uses: actions/upload-artifact@v3
with:
name: ${{ env.ARTIFACT_NAME_FULL }}
path: |
sitrep.json
${{ env.BADGE_FILENAME_FULL }}
6 changes: 3 additions & 3 deletions .github/workflows/_build_t5x.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -163,19 +163,19 @@ jobs:
# bring in utility functions
source .github/workflows/scripts/to_json.sh

badge_label='T5X ${{ inputs.ARCHITECTURE }} build'
badge_label='Upstream T5X ${{ inputs.ARCHITECTURE }} build'
tags="${{ steps.final-metadata.outputs.tags }}"
digest="${{ steps.final-build.outputs.digest }}"
outcome="${{ steps.final-build.outcome }}"

if [[ ${outcome} == "success" ]]; then
badge_message="pass"
badge_color=brightgreen
summary="T5X build on ${{ inputs.ARCHITECTURE }}: $badge_message"
summary="Upstream T5X build on ${{ inputs.ARCHITECTURE }}: $badge_message"
else
badge_message="fail"
badge_color=red
summary="T5X build on ${{ inputs.ARCHITECTURE }}: $badge_message"
summary="Upstream T5X build on ${{ inputs.ARCHITECTURE }}: $badge_message"
fi

to_json \
Expand Down
83 changes: 53 additions & 30 deletions .github/workflows/nightly-rosetta-pax-build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ on:
inputs:
BASE_IMAGE:
type: string
description: 'PAX image built by NVIDIA/JAX-Toolbox'
description: 'Upstream Pax mealkit image without $arch-mealkit suffix, e.g., (ghcr.io/nvidia/jax-toolbox-internal:6857094059-upstream-pax). Leaving empty implies ghcr.io/nvidia/upstream-pax:mealkit'
default: ''
required: false
PUBLISH:
Expand Down Expand Up @@ -49,15 +49,6 @@ jobs:
if: steps.if-upstream-failed.outputs.UPSTREAM_FAILED == 'true'
uses: styfle/[email protected]

- name: Determine if the resulting container should be 'published'
id: if-publish
shell: bash -x -e {0}
run:
# A container should be published if:
# 1) the workflow is triggered by workflow_dispatch and the PUBLISH input is true, or
# 2) the workflow is triggered by workflow_run (i.e., a nightly build)
echo "PUBLISH=${{ github.event_name == 'workflow_run' || (github.event_name == 'workflow_dispatch' && inputs.PUBLISH) }}" >> $GITHUB_OUTPUT

- name: Set build date
id: date
shell: bash -x -e {0}
Expand All @@ -77,6 +68,10 @@ jobs:
BASE_IMAGE_ARM64=${{ inputs.BASE_IMAGE }}-arm64-mealkit
fi
echo "BASE_LIBRARY=${{ env.BASE_LIBRARY }}" >> $GITHUB_OUTPUT
# A container should be published if:
# 1) the workflow is triggered by workflow_dispatch and the PUBLISH input is true, or
# 2) the workflow is triggered by workflow_run (i.e., a nightly build)
echo "PUBLISH=${{ github.event_name == 'workflow_run' || (github.event_name == 'workflow_dispatch' && inputs.PUBLISH) }}" >> $GITHUB_OUTPUT
echo "BASE_IMAGE_AMD64=${BASE_IMAGE_AMD64}" >> $GITHUB_OUTPUT
echo "BASE_IMAGE_ARM64=${BASE_IMAGE_ARM64}" >> $GITHUB_OUTPUT

Expand All @@ -100,6 +95,29 @@ jobs:
BASE_IMAGE: ${{ needs.metadata.outputs.BASE_IMAGE_ARM64 }}
secrets: inherit

publish-build-badge:
needs: [metadata, amd64, arm64]
uses: ./.github/workflows/_publish_badge.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that _publish_badge.yaml will be deprecated soon.

if: always()
with:
ENDPOINT_FILENAME: 'rosetta-pax-build-status.json'
PUBLISH: ${{ needs.metadata.outputs.PUBLISH == 'true' }}
SCRIPT: |
if [[ ${{ needs.amd64.result }} == "success" && ${{ needs.arm64.result }} == "success" ]]; then
BADGE_COLOR=brightgreen
MSG=passing
STATUS=success
else
BADGE_COLOR=red
MSG=failing
STATUS=failure
fi
echo "LABEL='nightly'" >> $GITHUB_OUTPUT
echo "MESSAGE='${MSG}'" >> $GITHUB_OUTPUT
echo "COLOR='${BADGE_COLOR}'" >> $GITHUB_OUTPUT
echo "STATUS='${STATUS}'" >> ${GITHUB_OUTPUT}
secrets: inherit

publish-mealkit:
needs: [metadata, amd64, arm64]
if: needs.metadata.outputs.PUBLISH == 'true'
Expand All @@ -113,8 +131,17 @@ jobs:
type=raw,value=mealkit,priority=500
type=raw,value=mealkit-${{ needs.metadata.outputs.BUILD_DATE }},priority=500

# TODO: Test ARM when runners available
test-amd64:
needs: amd64
uses: ./.github/workflows/_test_pax_rosetta.yaml
with:
PAX_IMAGE: ${{ needs.amd64.outputs.DOCKER_TAG_FINAL }}
secrets: inherit

# TODO: ARM Tests
publish-final:
needs: [metadata, amd64, arm64]
needs: [metadata, amd64, arm64, test-amd64]
if: needs.metadata.outputs.PUBLISH == 'true'
uses: ./.github/workflows/_publish_container.yaml
with:
Expand All @@ -123,49 +150,45 @@ jobs:
${{ needs.arm64.outputs.DOCKER_TAG_FINAL }}
TARGET_IMAGE: pax
TARGET_TAGS: |
type=raw,value=latest,priority=1000
type=raw,value=nightly-${{ needs.metadata.outputs.BUILD_DATE }},priority=900
${{ needs.test-amd64.outputs.TEST_STATUS == 'success' && 'type=raw,value=latest,priority=1000' || '' }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to put this logic in a 'metadata' job/jobstep, and then reference the job/jobstep output for better readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on test results. metadata is an ancestor of the test step.

type=raw,value=nightly-${{ needs.metadata.outputs.BUILD_DATE }},priority=900

test-pax:
needs: [metadata, amd64, arm64]
uses: ./.github/workflows/_test_pax_rosetta.yaml
if: (github.event_name == 'workflow_run' && github.event.workflow_run.conclusion == 'success') || github.event_name == 'workflow_dispatch'
with:
PAX_IMAGE: ${{ needs.amd64.outputs.DOCKER_TAG_FINAL }}
secrets: inherit

publish-test:
needs: [metadata, amd64, arm64, test-pax]
# TODO: ARM Tests
publish-test-badge:
needs: [metadata, publish-build-badge, test-amd64]
uses: ./.github/workflows/_publish_badge.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that `_publish_badge.yaml`` will be deprecated soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be okay to tackle this in a subsequent PR? My thinking is we can get the reproducible build/ci changes out and then circle back to sitrep. I expect the reproducible PRs to all be flushed out within a week or two :)

if: ( always() )
if: always()
secrets: inherit
with:
ENDPOINT_FILENAME: 'rosetta-pax-overall-test-status.json'
PUBLISH: ${{ github.event_name == 'workflow_run' || needs.metadata.outputs.PUBLISH == 'true' }}
PUBLISH: ${{ needs.metadata.outputs.PUBLISH == 'true' }}
SCRIPT: |
PAX_STATUS=${{ needs.test-pax.outputs.TEST_STATUS }}
PAX_STATUS=${{ needs.test-amd64.outputs.TEST_STATUS }}

echo "LABEL='Tests'" >> $GITHUB_OUTPUT

if [[ ${{ needs.amd64.result }} == "success" && ${{ needs.arm64.result }} == "success" ]]; then
STATUS=failure
if [[ ${{ needs.publish-build-badge.outputs.STATUS }} == "success" ]]; then
if [[ $PAX_STATUS == "success" ]]; then
COLOR=brightgreen
MESSAGE="MGMN passed"
STATUS=success
else
COLOR=red
MESSAGE="MGMN failed"
fi
else
MESSAGE="n/a"
COLOR="red"
MESSAGE="n/a"
fi

echo "MESSAGE='${MESSAGE}'" >> $GITHUB_OUTPUT
echo "COLOR='${COLOR}'" >> $GITHUB_OUTPUT
echo "MESSAGE='${MESSAGE}'" >> $GITHUB_OUTPUT
echo "STATUS='${STATUS}'" >> ${GITHUB_OUTPUT}

finalize:
if: always()
needs: [metadata, amd64, arm64]
needs: [metadata, amd64, arm64, test-amd64]
uses: ./.github/workflows/_finalize.yaml
with:
PUBLISH_BADGE: ${{ needs.metadata.outputs.PUBLISH == 'true' }}
Expand Down
Loading
Loading