Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update e2e-wine UAT for Kubeflow 1.9 #85

Closed
misohu opened this issue Jul 12, 2024 · 2 comments
Closed

Update e2e-wine UAT for Kubeflow 1.9 #85

misohu opened this issue Jul 12, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@misohu
Copy link
Member

misohu commented Jul 12, 2024

Context

For the Kubeflow 1.9 we must update e2e-wine UAT because it uses Seldon to deploy the trained model. With Kubeflow 1.9 seldon wont be part of the bundle.

While we are working on this we should also use kfp v2 SDK in the test rather then the older v1.

What needs to get done

  1. Use Kserve for serving model in e2e-wine UAT for Kubeflow 1.9
  2. Use kfp v2 SDK in e2e-wine UAT for Kubeflow 1.9

Definition of Done

  1. UAT is updated
  2. Tests are passing (manual, AKS, EKS)
@misohu misohu added the enhancement New feature or request label Jul 12, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5995.

This message was autogenerated

@misohu
Copy link
Member Author

misohu commented Aug 23, 2024

Closed with #104

Key things I have learned:

  • KFP v2 introduces new syntax for writing pipelines. This includes changes in how components and data passing are handled.
  • Each pipeline step defaults to Python 3.7, which should always be overridden to Python 3.11 for consistency.

Example:

@component(
    base_image="python:3.11",  # Use Python 3.11 base image
    packages_to_install=["requests==2.32.3", "pandas==2.2.2"]
)
def download_dataset(url: str, dataset_path: OutputPath('Dataset')) -> None:
    # Import necessary packages within the function
    import requests
    import pandas as pd

    # Download the dataset from the provided URL
    response = requests.get(url)
    response.raise_for_status()
  • As shown above, you can install the necessary requirements directly for each pipeline step using packages_to_install.
  • Ensure that required packages are imported within the function body for each step.

Passing Data Between Steps:

  • To pass data between steps in the pipeline, KFP v2 now uses InputPath and OutputPath annotations. Please use the correct annotation type within the brackets; otherwise, data may be incorrectly passed between steps. Some useful annotations include Dataset for passing dataset files through S3 and String for passing strings as arguments in the Argo workflow outside of S3.

Example:

@pipeline(name='download-preprocess-train-deploy-pipeline')
def download_preprocess_train_deploy_pipeline(url: str):
    # Step 1: Download the dataset from the URL
    download_task = download_dataset(url=url)
    
    # Step 2: Preprocess the downloaded dataset
    preprocess_task = preprocess_dataset(
        dataset=download_task.outputs['dataset_path']
    )
  • OutputPath typed arguments can be accessed through the outputs attribute, as shown above.
  • Unfortunately, there is no simple way to pass secrets to KFP v2 steps as we did in v1. To work around this, I am mounting environments directly to the step with the same values I use in my notebook.

Note: Most online resources focus on KFP v1, so I highly recommend consulting the official documentation for KFP v2.5.0, which is compatible with our kserve 0.13.1 SDK due to the protobuf requirement conflict. You can find the relevant documentation here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant