-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow hangs with build summary #1156
Comments
Thanks for your feedback, can you share full logs of your workflow please? |
|
There are some ARGs and ENVs I left out (and also just tried without) |
Seems Docker client/server version is not aligned on your self-hosted environment but don't think that's related.
Thanks that's fine, can you share logs of "Post Set up Docker Buildx" step as well? |
Logs for the "Post Set up Docker Buildx" step
|
This does not look like the logs for "Post Set up Docker Buildx" step but "Set up Docker Buildx". It should look like this: https://github.com/docker/setup-buildx-action/actions/runs/9677658877/job/26699794985#step:11:3 |
True - the job hangs at the "Post Build and push" step and I cancel it (after ~20min), so there is no "Post …" anything |
Ah damn ok, I will try to repro on my side with Docker 27.0.1 and let you know. This might be the issue. |
I'm pretty sure it already happend with 26.1.4 … sorry :( |
happening to me too with github hosted runners. Stuck on "Post Build Image" step. I had to cancel it, here is raw log snippet from the very end:
UI was also showing something like "State not found" at the very last line. Was stuck for 30 minutes and took a while to cancel the job. |
@kkopachev Thanks for your feedback
Public runners I guess then? Or self-hosted?
So it hangs at Lines 165 to 169 in 1556069
Could you share full logs of your workflow please and the workflow YAML as well? That would help. In the meantime your can disable summary generation with -
name: Build and push
uses: docker/build-push-action@v6
with:
push: true
tags: user/app:latest
env:
DOCKER_BUILD_NO_SUMMARY: true Or globally within your workflow if you have multiple calls to the action: name: ci
on:
push:
branches:
- 'main'
env:
DOCKER_BUILD_NO_SUMMARY: true |
We're using public GHA runners (x86-64), but building ARM images using kubernetes driver. Even cancellation takes long time Workflow files are like this: workflow.yaml
name: Review Deploy
on:
pull_request:
jobs:
build_image:
name: Build Image
uses: %ORG%/%REPO%/.github/workflows/[email protected]
with:
aws-role: ${{ vars.AWS_ROLE }}
aws-role-session-name: session-name
kubernetes-cluster: something-staging
registry: 1234567890.dkr.ecr.us-east-1.amazonaws.com/app
secrets:
build-secrets: |
${{ format('gh_token={0}', secrets.ACCESS_TOKEN) }} and then reusable workflow referenced is quite generic, so I snipped just relevant parts and omitted eks connection, aws auth and such reusable-workflow.yaml
jobs:
build:
name: Build and Push Image
runs-on: ubuntu-latest
outputs:
ref_name: ${{ steps.ref_name.outputs.ref_name }}
sha_short: ${{ steps.get_commit.outputs.sha_short }}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ inputs.ref }}
# snip-snip
- name: Prepare Buildx Driver Opts
uses: actions/github-script@v7
id: driver-opts
with:
script: |
const platforms = `${{ inputs.platforms }}`;
let driverOpts = "";
if (platforms.includes('linux/amd64')) {
const random = Math.floor((Math.random() * 1000000) + 1);
driverOpts = `
- driver-opts:
- namespace=${{ inputs.kubernetes-namespace }}
- "nodeselector=kubernetes.io/arch=amd64"
platforms: linux/amd64
name: arm64-${{ steps.get_commit.outputs.sha_short }}-${random}`;
}
core.setOutput('driver-opts', driverOpts);
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
driver: kubernetes
driver-opts: |
namespace=${{ inputs.kubernetes-namespace }}
"nodeselector=kubernetes.io/arch=arm64"
platforms: linux/arm64
append: ${{ steps.driver-opts.outputs.driver-opts }}
- name: Build Image
uses: docker/build-push-action@v6
with:
build-args: ${{ inputs.build-args }}
cache-from: |
type=registry,ref=${{ inputs.registry }}:latest
${{ steps.ref_name.outputs.ref_name && format('type=registry,ref={0}:{1}', inputs.registry, steps.ref_name.outputs.ref_name) || '' }}
cache-to: type=inline
context: ${{ inputs.context }}
file: ${{ inputs.file }}
labels: |
org.opencontainers.image.revision=${{ github.sha }}
org.opencontainers.image.source=github.com/${{ github.repository }}
${{ inputs.labels }}
platforms: ${{ inputs.platforms }}
push: true
provenance: false
secrets: ${{ secrets.build-secrets }}
tags: |
${{ inputs.registry }}:${{ steps.get_commit.outputs.sha_short }}
${{ steps.ref_name.outputs.ref_name && format('{0}:{1}', inputs.registry, steps.ref_name.outputs.ref_name) || '' }}
${{ inputs.latest && format('{0}:{1}', inputs.registry, 'latest') || '' }}
outputs: type=image,oci-mediatypes=true,compression=zstd,compression-level=3,force-compression=true,push=true
target: ${{ inputs.target }} For the logs, here is a logs for "Post Build Image" step which is hanging: Details
I noticed it references builder, so here is a snippet of setup-buildx output: Details
After job cancellation, builder pods in kubernetes keep running. Their logs have a few of these (might be red herring)
also this under workflow summary:
|
@kkopachev Thanks!
Ok you're using the kubernetes driver but don't think there is a dial issue when exporting the build in "Post Build Image"
Looking at your logs it seems to write the summary so does it hangs right after "Writing summary" or is it just printed after cancellation but hangs before that? |
"Writing summary" is where it hangs. I can see artifact uploaded successfully. I guess it's something in the summary that break it. If you could dump raw unformatted summary (json?) to a file and upload it as artifact before writing summary, I can rerun my workflow to hopefully see if there are anything suspicious. For me it looks like an issue with summary has something that breaks the worker |
Thanks that's useful, I think the process exporting the build record remains opened which make this post step hangs. Probably around https://github.com/docker/actions-toolkit/blob/fe9937dd36b64d2090fbfec40e144944ae390a12/src/buildx/history.ts#L115. I will take a closer look. |
@kkopachev I made some changes in docker/actions-toolkit#392 Can you try with: uses: crazy-max/docker-build-push-action@test-process-term And give me the logs of the post step please? Thanks! |
@crazy-max nothing changed, I don't see new logs. I think you'd have to re-generate dist.js as this commit happened before this commit If it is relevant, build image is all cached |
I have new logs on my side like https://github.com/crazy-max/docker-build-push-action/actions/runs/9730285429/job/26853184981#step:6:20: |
oh, didn't notice that. Still hangs. line Full logs
|
Ok looks like this process is the culprit, seems https://github.com/docker/actions-toolkit/pull/392/files#diff-fcf6a886ba5fdffc4fb064e49c7776a41c5fa00be72b7aef6f56305b8632ad8aR175 has no effect and process is not gracefully terminated. Will take a closer look, thanks for testing this out! Edit: made new changes if you can try again with: uses: crazy-max/docker-build-push-action@test-process-term Thanks! |
You can test with latest changes on default branch: uses: docker/build-push-action@master |
@crazy-max It still hangs for me
|
Ok then it's not related to child processes as I see both exit in the post step from your logs. Will take a closer look by using the |
@crazy-max We are also seeing a very similar issue where the workflow just hangs with build summary enabled. Log is actually an exact match with above, no further activity after |
@shunbhark-circle what kind of builder are you using? |
My problems went away when I switched to the "Actions Runner Controller" and from my custom runner image to the official image as the base for my runner like this:
Docker is already included in that image. Also service container now work as well. I also switched to the default docker driver for my cluster:
Instead of |
|
@joh-klein |
We've moved to ARC as well and it works fine there. For us reason to use kubernetes driver is to be able to build ARM images faster. Now we run whole workflow on ARC |
This is happening for us aswell.
Logg output:
|
Are you using
in your ARC setup? |
@joh-klein no, we want to use our own runner so we are using a custom template like this:
Dind (docker:27.3.1-dind) is running as a pod in our kubernetes cluster and our action runner spec has the DOCKER_HOST set to that pods address. I can see that running mode dind sets some more things in the spec (https://github.com/actions/actions-runner-controller/blob/96d1bbcf2fa961e7f64fad45ea8903b741cb3e16/charts/gha-runner-scale-set/values.yaml#L115) Maby some of the other thing are needed for this to work aswell? |
yeah - to get the docker stuff working 100% there are a couple of settings and additional containers. |
We are relying on docker/buildx#2711 would solve this issue by eliminating the dependency on docker cli. In the meantime you can disable summary with -
name: Build and push
uses: docker/build-push-action@v6
with:
push: true
tags: user/app:latest
env:
DOCKER_BUILD_SUMMARY: false |
Contributing guidelines
I've found a bug, and:
Description
I updated my workflow to use
docker/build-push-action@v6
and the workflow doesn't finish anymore. It hangs at the Post-Build-Push step.Expected behaviour
Workflow should finish
Actual behaviour
Workflow hangs at the Post "Build and push" step and times out
Repository URL
No response
Workflow run URL
No response
YAML workflow
Workflow logs
BuildKit logs
No response
Additional info
No response
The text was updated successfully, but these errors were encountered: