Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend] Not able to retry failed KFPv1 pipelines from the UI #11248

Open
richardburakowski opened this issue Sep 25, 2024 · 0 comments
Open

Comments

@richardburakowski
Copy link

richardburakowski commented Sep 25, 2024

Environment

  • How did you deploy Kubeflow Pipelines (KFP)?
    Standalone kubernetes cluster v1.29.4
    kubeflow/manifests-v1.9.1-rc.1

also seeing the same issue with kubeflow/manifests-v1.9.0

  • KFP version:
    2.3.0

  • KFP SDK version:
    1.8.22

Steps to reproduce

When testing with a simple, single failing step pipeline, using the kubeflow dashboard, I am able to Retry the failed pipeline run

import kfp
from kfp import dsl

@dsl.component
def exit_fail_op():
    return dsl.ContainerOp(
        name="One Step Fail",
        image="library/bash:5.2-rc-alpine3.15",
        command=["sh", "-c", "exit 255"]
    )

@dsl.pipeline(name="one step fail")
def pipeline():
    exit_fail = exit_fail_op()

kfp.compiler.Compiler().compile(pipeline, 'onestepfail.yaml')

If I test with a more complex 2 step pipeline where the second step fails, Retry will leave the pipeline run in state Running indefinitely

import kfp
from kfp import dsl

@dsl.component
def exit_success_op():
    return dsl.ContainerOp(
        name="Two Step Success",
        image="library/bash:5.2-rc-alpine3.15",
        command=["sh", "-c", "exit 0"]
    )

@dsl.component
def exit_fail_op ():
    return dsl.ContainerOp (
        name="Two Step Fail",
        image="library/bash:5.2-rc-alpine3.15",
        command=["sh", "-c", "exit 255"]
    )

@dsl.pipeline(name="two step fail")
def pipeline():
    exit_success = exit_success_op()
    exit_fail    = exit_fail_op().after(exit_success)

kfp.compiler.Compiler().compile(pipeline, 'twostepfail.yaml')

workflow status is updated

NAME                    STATUS    AGE   MESSAGE
one-step-fail-2-m48zp   Failed    30m   
two-step-fail-2-z7vxn   Running   25m

Expected result

Pipeline re run

Materials and Reference


Impacted by this bug? Give it a 👍.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant