Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reproducible rosetta build] Adds support for building rosetta with local patches and an already generated patch dir #354

Closed
wants to merge 119 commits into from

Conversation

terrykong
Copy link
Contributor

@terrykong terrykong commented Nov 1, 2023

flowchart TD
    B -->|./extract-patches.sh| PG1["./patches/patchlist.txt.gen"] & PG2["./patches/*.patches"]
    PO[patchlist.txt] -->|copy| PI
    subgraph main build
    direction LR
    PI[patchlist.txt]-->|--build-arg|B[docker buildx]
    PG1
    PG2
    end

    PG1 --> |copy|PG11["./patches/patchlist.txt.gen"] 
    PG2 --> |copy|PG22["./patches/*.patches"]
    PG11 --> |copy| PG111
    PG22 --> |copy| PG222

    subgraph reproducible build
    direction LR
    PG111["./patches/patchlist.txt.gen"] --> |--build-arg| B2
    PG222["./patches/*.patches"] --> |copy| B2
    B2[docker buildx]
    end
Loading

This introduces these new mechanisms to the rosetta build:

  1. Building with local patches
  2. Generating local patches for reproducibility
  3. Add a --build-arg for specifying a different patchlist file
  4. Save patches as github artifacts

The above is a high level flow of how the build can be reproduced after this PR. We can now build an image from an existing image or existing patchlists by doing:

scripts/extract-patches.sh <img-name>  # Extracts generated patch dir under ./patches/
docker buildx build --build-arg T5X_PATCHLIST=patches/t5x/patchlist-t5x.txt.gen --build-arg FLAX_PATCHLIST=patches/flax/patchlist-flax.txt.gen --target rosetta --tag rosetta:latest -f Dockerfile.t5x .

and this should yield an identical distribution.

To create local patches, this change updates git cherry-pick calls to git format-patch + git am to preserve metadata in patch files.

note: This will not guarantee that the pip packages will be reproducible as that will need to be handled by a different PR.

  • update rosetta workflows with new mealkit/final mechanic
  • update ci.yaml

@terrykong terrykong marked this pull request as draft November 1, 2023 07:04
@terrykong terrykong marked this pull request as draft November 13, 2023 17:49
@terrykong
Copy link
Contributor Author

Converting to draft in light of #371 . Will rebase this on top of that branch and add this feature

@terrykong terrykong changed the base branch from main to add-pip-compile November 13, 2023 23:31
@terrykong
Copy link
Contributor Author

terrykong commented Nov 14, 2023

Testing using the mealkit images from the CI to trigger rosetta CI:

Base automatically changed from add-pip-compile to main November 21, 2023 06:41
@terrykong terrykong marked this pull request as ready for review November 27, 2023 17:16
@@ -95,6 +95,15 @@ jobs:
build-args: |
BASE_IMAGE=${{ steps.defaults.outputs.BASE_IMAGE }}

- name: Extract patches
run: rosetta/scripts/extract-patches.sh ${{ steps.meta.outputs.tags }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we specify the name of the output patch file via an argument? Currently, we can only infer it from the artifact upload step below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line pulls all of the patches, not just one. The dir copied out is pulled from a predetermined path /opt/rosetta/patches. I don't foresee any reason to configure this

# TODO: ARM
publish-build-badge:
needs: [metadata, amd64, arm64]
uses: ./.github/workflows/_publish_badge.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that _publish_badge.yaml will be deprecated soon in favor of the sitrep system.

@@ -100,6 +95,29 @@ jobs:
BASE_IMAGE: ${{ needs.metadata.outputs.BASE_IMAGE_ARM64 }}
secrets: inherit

publish-build-badge:
needs: [metadata, amd64, arm64]
uses: ./.github/workflows/_publish_badge.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that _publish_badge.yaml will be deprecated soon.

@@ -123,49 +150,45 @@ jobs:
${{ needs.arm64.outputs.DOCKER_TAG_FINAL }}
TARGET_IMAGE: pax
TARGET_TAGS: |
type=raw,value=latest,priority=1000
type=raw,value=nightly-${{ needs.metadata.outputs.BUILD_DATE }},priority=900
${{ needs.test-amd64.outputs.TEST_STATUS == 'success' && 'type=raw,value=latest,priority=1000' || '' }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to put this logic in a 'metadata' job/jobstep, and then reference the job/jobstep output for better readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on test results. metadata is an ancestor of the test step.

needs: [metadata, amd64, arm64, test-pax]
# TODO: ARM Tests
publish-test-badge:
needs: [metadata, publish-build-badge, test-amd64]
uses: ./.github/workflows/_publish_badge.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that `_publish_badge.yaml`` will be deprecated soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be okay to tackle this in a subsequent PR? My thinking is we can get the reproducible build/ci changes out and then circle back to sitrep. I expect the reproducible PRs to all be flushed out within a week or two :)

${{ needs.arm64.outputs.DOCKER_TAG_FINAL }}
TARGET_IMAGE: t5x
TARGET_TAGS: |
${{ ( needs.test-t5x-amd64.outputs.TEST_STATUS == 'success' && needs.test-vit-amd64.outputs.TEST_STATUS == 'success' && needs.test-unit-amd64.outputs.TEST_STATUS == 'success' ) && 'type=raw,value=latest,priority=1000' || '' }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment regarding the placement of the logic in a separate job/jobstep.

#!/bin/bash

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
cd $SCRIPT_DIR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's not jump into this folder, but use pushd-popd instead?

pushd $SCRIPT_DIR
<the rest code>
popd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushd/popd isn't needed here b/c I only enter one directory and this is in a subshell so the calling parent shell doesn't change dirs

@terrykong
Copy link
Contributor Author

@DwarKapex @yhtang
Closing this in favor of #405 since I undid some of the stuff in this PR to accommodate the manifest change.

@terrykong terrykong closed this Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants