Vertex Pipelines Deployer

Deploy Vertex Pipelines within minutes

This tool is a wrapper around kfp and google-cloud-aiplatform that allows you to check, compile, upload, run, and schedule Vertex Pipelines in a standardized manner.

📚 Table of Contents

Why this tool?
Prerequisites
Installation

From git repo
From Artifact Registry (not available in PyPI yet)
Add to requirements

Usage

Setup
Folder Structure
CLI: Deploying a Pipeline with `deploy`
CLI: Checking Pipelines are valid with `check`
CLI: Other commands

`config`
`create`
`init`
`list`

CLI: Options
Configuration

Full CLI documentation

❓ Why this tool?

Three use cases:

CI: Check pipeline validity.
Dev mode: Quickly iterate over your pipelines by compiling and running them in multiple environments (test, dev, staging, etc.) without duplicating code or searching for the right kfp/aiplatform snippet.
CD: Deploy your pipelines to Vertex Pipelines in a standardized manner in your CD with Cloud Build or GitHub Actions.

Two main commands:

check: Check your pipelines (imports, compile, check configs validity against pipeline definition).
deploy: Compile, upload to Artifact Registry, run, and schedule your pipelines.

📋 Prerequisites

Unix-like environment (Linux, macOS, WSL, etc.)
Python 3.8 to 3.10
Google Cloud SDK
A GCP project with Vertex Pipelines enabled

📦 Installation

From PyPI

pip install vertex-deployer

From git repo

Stable version:

pip install git+https://github.com/artefactory/vertex-pipelines-deployer.git@main

Develop version:

pip install git+https://github.com/artefactory/vertex-pipelines-deployer.git@develop

If you want to test this package on examples from this repo:

git clone [email protected]:artefactory/vertex-pipelines-deployer.git
poetry install
poetry shell  # if you want to activate the virtual environment
cd example

🚀 Usage

🛠️ Setup

Setup your GCP environment:

export PROJECT_ID=<gcp_project_id>
gcloud config set project $PROJECT_ID
gcloud auth login
gcloud auth application-default login

You need the following APIs to be enabled:

Cloud Build API
Artifact Registry API
Cloud Storage API
Vertex AI API

gcloud services enable \
    cloudbuild.googleapis.com \
    artifactregistry.googleapis.com \
    storage.googleapis.com \
    aiplatform.googleapis.com

Create an artifact registry repository for your base images (Docker format):

export GAR_DOCKER_REPO_ID=<your_gar_repo_id_for_images>
export GAR_LOCATION=<your_gar_location>
gcloud artifacts repositories create ${GAR_DOCKER_REPO_ID} \
    --location=${GAR_LOCATION} \
    --repository-format=docker

Build and upload your base images to the repository. To do so, please follow Google Cloud Build documentation.
Create an artifact registry repository for your pipelines (KFP format):

export GAR_PIPELINES_REPO_ID=<your_gar_repo_id_for_pipelines>
gcloud artifacts repositories create ${GAR_PIPELINES_REPO_ID} \
    --location=${GAR_LOCATION} \
    --repository-format=kfp

Create a GCS bucket for Vertex Pipelines staging:

export GCP_REGION=<your_gcp_region>
export VERTEX_STAGING_BUCKET_NAME=<your_bucket_name>
gcloud storage buckets create gs://${VERTEX_STAGING_BUCKET_NAME} --location=${GCP_REGION}

Create a service account for Vertex Pipelines:

export VERTEX_SERVICE_ACCOUNT_NAME=foobar
export VERTEX_SERVICE_ACCOUNT="${VERTEX_SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"

gcloud iam service-accounts create ${VERTEX_SERVICE_ACCOUNT_NAME}

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${VERTEX_SERVICE_ACCOUNT}" \
    --role="roles/aiplatform.user"

gcloud storage buckets add-iam-policy-binding gs://${VERTEX_STAGING_BUCKET_NAME} \
    --member="serviceAccount:${VERTEX_SERVICE_ACCOUNT}" \
    --role="roles/storage.objectUser"

gcloud artifacts repositories add-iam-policy-binding ${GAR_PIPELINES_REPO_ID} \
   --location=${GAR_LOCATION} \
   --member="serviceAccount:${VERTEX_SERVICE_ACCOUNT}" \
   --role="roles/artifactregistry.admin"

You can use the deployer CLI (see example below) or import VertexPipelineDeployer in your code (try it yourself).

📁 Folder Structure

You must respect the following folder structure. If you already follow the Vertex Pipelines Starter Kit folder structure, it should be pretty smooth to use this tool:

vertex
├─ configs/
│  └─ {pipeline_name}
│     └─ {config_name}.json
└─ pipelines/
   └─ {pipeline_name}.py

!!! tip "About folder structure" You must have at least these files. If you need to share some config elements between pipelines, you can have a shared folder in configs and import them in your pipeline configs.

If you're following a different folder structure, you can change the default paths in the `pyproject.toml` file.
See [Configuration](#configuration) section for more information.

Pipelines

Your file {pipeline_name}.py must contain a function called {pipeline_name} decorated using kfp.dsl.pipeline. In previous versions, the functions / object used to be called pipeline but it was changed to {pipeline_name} to avoid confusion with the kfp.dsl.pipeline decorator.

# vertex/pipelines/dummy_pipeline.py
import kfp.dsl

# New name to avoid confusion with the kfp.dsl.pipeline decorator
@kfp.dsl.pipeline()
def dummy_pipeline():
    ...

# Old name
@kfp.dsl.pipeline()
def pipeline():
    ...

Configs

Config file can be either .py, .json, .toml or yaml format. They must be located in the config/{pipeline_name} folder.

Why multiple formats?

.py files are useful to define complex configs (e.g. a list of dicts) while .json / .toml / yaml files are useful to define simple configs (e.g. a string). It also adds flexibility to the user and allows you to use the deployer with almost no migration cost.

How to format them?

.py files must be valid python files with two important elements:
- parameter_values to pass arguments to your pipeline
- input_artifacts if you want to retrieve and create input artifacts to your pipeline. See Vertex Documentation for more information.
.json files must be valid json files containing only one dict of key: value representing parameter values.
.toml files must be the same. Please note that TOML sections will be flattened, except for inline tables. Section names will be joined using "_" separator and this is not configurable at the moment. Example:

=== "TOML file" toml [modeling] model_name = "my-model" params = { lambda = 0.1 }

=== "Resulting parameter values" python { "modeling_model_name": "my-model", "modeling_params": { "lambda": 0.1 } }
.yaml files must be valid yaml files containing only one dict of key: value representing parameter values.

??? question "Why are sections flattened when using TOML config files?" Vertex Pipelines parameter validation and parameter logging to Vertex Experiments are based on the parameter name. If you do not flatten your sections, you'll only be able to validate section names and that they should be of type dict.

Not very useful.

??? question "Why aren't input_artifacts supported in TOML / JSON config files?" Because it's low on the priority list. Feel free to open a PR if you want to add it.

How to name them?

{config_name}.py or {config_name}.json or {config_name}.toml. config_name is free but must be unique for a given pipeline.

Settings

You will also need the following ENV variables, either exported or in a .env file (see example in example.env):

PROJECT_ID=YOUR_PROJECT_ID  # GCP Project ID
GCP_REGION=europe-west1  # GCP Region

GAR_LOCATION=europe-west1  # Google Artifact Registry Location
GAR_PIPELINES_REPO_ID=YOUR_GAR_KFP_REPO_ID  # Google Artifact Registry Repo ID (KFP format)

VERTEX_STAGING_BUCKET_NAME=YOUR_VERTEX_STAGING_BUCKET_NAME  # GCS Bucket for Vertex Pipelines staging
VERTEX_SERVICE_ACCOUNT=YOUR_VERTEX_SERVICE_ACCOUNT  # Vertex Pipelines Service Account

!!! note "About env files" We're using env files and dotenv to load the environment variables. No default value for --env-file argument is provided to ensure that you don't accidentally deploy to the wrong project. An example.env file is provided in this repo. This also allows you to work with multiple environments thanks to env files (test.env, dev.env, prod.env, etc)

🚀 CLI: Deploying a Pipeline with `deploy`

Let's say you defined a pipeline in dummy_pipeline.py and a config file named config_test.json. You can deploy your pipeline using the following command:

vertex-deployer deploy dummy_pipeline \
    --compile \
    --upload \
    --run \
    --env-file example.env \
    --tags my-tag \
    --config-filepath vertex/configs/dummy_pipeline/config_test.json \
    --experiment-name my-experiment \
    --enable-caching \
    --skip-validation

✅ CLI: Checking Pipelines are valid with `check`

To check that your pipelines are valid, you can use the check command. It uses a pydantic model to:

check that your pipeline imports and definition are valid
check that your pipeline can be compiled
check that all configs related to the pipeline are respecting the pipeline definition (using a Pydantic model based on pipeline signature)

To validate one or multiple pipeline(s):

vertex-deployer check dummy_pipeline <other pipeline name>

To validate all pipelines in the vertex/pipelines folder:

vertex-deployer check --all

🛠️ CLI: Other commands

`config`

You can check your vertex-deployer configuration options using the config command. Fields set in pyproject.toml will overwrite default values and will be displayed differently:

vertex-deployer config --all

`create`

You can create all files needed for a pipeline using the create command:

vertex-deployer create my_new_pipeline --config-type py

This will create a my_new_pipeline.py file in the vertex/pipelines folder and a vertex/config/my_new_pipeline/ folder with multiple config files in it.

`init`

To initialize the deployer with default settings and folder structure, use the init command:

vertex-deployer init

$ vertex-deployer init
Welcome to Vertex Deployer!
This command will help you getting fired up.
Do you want to configure the deployer? [y/n]: n
Do you want to build default folder structure [y/n]: n
Do you want to create a pipeline? [y/n]: n
All done ✨

`list`

You can list all pipelines in the vertex/pipelines folder using the list command:

vertex-deployer list --with-configs

🍭 CLI: Options

vertex-deployer --help

To see package version:

vertex-deployer --version

To adapt log level, use the --log-level option. Default is INFO.

vertex-deployer --log-level DEBUG deploy ...

Configuration

You can configure the deployer using the pyproject.toml file to better fit your needs. This will overwrite default values. It can be useful if you always use the same options, e.g. always the same --scheduler-timezone

[tool.vertex_deployer]
vertex_folder_path = "my/path/to/vertex"
log_level = "INFO"

[tool.vertex_deployer.deploy]
scheduler_timezone = "Europe/Paris"

You can display all the configurable parameterss with default values by running:

$ vertex-deployer config --all
'*' means the value was set in config file

* vertex_folder_path=my/path/to/vertex
* log_level=INFO
deploy
  env_file=None
  compile=True
  upload=False
  run=False
  schedule=False
  cron=None
  delete_last_schedule=False
  * scheduler_timezone=Europe/Paris
  tags=['latest']
  config_filepath=None
  config_name=None
  enable_caching=False
  experiment_name=None
check
  all=False
  config_filepath=None
  raise_error=False
list
  with_configs=True
create
  config_type=json

Repository Structure

├─ .github
│  ├─ ISSUE_TEMPLATE/
│  ├─ workflows
│  │  ├─ ci.yaml
│  │  ├─ pr_agent.yaml
│  │  └─ release.yaml
│  ├─ CODEOWNERS
│  └─ PULL_REQUEST_TEMPLATE.md
├─ deployer                                     # Source code
│  ├─ __init__.py
│  ├─ cli.py
│  ├─ constants.py
│  ├─ pipeline_checks.py
│  ├─ pipeline_deployer.py
│  ├─ settings.py
│  └─ utils
│     ├─ config.py
│     ├─ console.py
│     ├─ exceptions.py
│     ├─ logging.py
│     ├─ models.py
│     └─ utils.py
├─ docs/                                        # Documentation folder (mkdocs)
├─ templates/                                   # Semantic Release templates
├─ tests/
├─ example                                      # Example folder with dummy pipeline and config
|   ├─ example.env
│   └─ vertex
│      ├─ components
│      │  └─ dummy.py
│      ├─ configs
│      │  ├─ broken_pipeline
│      │  │  └─ config_test.json
│      │  └─ dummy_pipeline
│      │     ├─ config_test.json
│      │     ├─ config.py
│      │     └─ config.toml
│      ├─ deployment
│      ├─ lib
│      └─ pipelines
│         ├─ broken_pipeline.py
│         └─ dummy_pipeline.py
├─ .gitignore
├─ .pre-commit-config.yaml
├─ catalog-info.yaml                            # Roadie integration configuration
├─ CHANGELOG.md
├─ CONTRIBUTING.md
├─ LICENSE
├─ Makefile
├─ mkdocs.yml                                   # Mkdocs configuration
├─ pyproject.toml
└─ README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vertex Pipelines Deployer

Deploy Vertex Pipelines within minutes

❓ Why this tool?

📋 Prerequisites

📦 Installation

From PyPI

From git repo

🚀 Usage

🛠️ Setup

📁 Folder Structure

Pipelines

Configs

Settings

🚀 CLI: Deploying a Pipeline with `deploy`

✅ CLI: Checking Pipelines are valid with `check`

🛠️ CLI: Other commands

`config`

`create`

`init`

`list`

🍭 CLI: Options

Configuration

Repository Structure

About

Releases 20

Packages

Contributors 7

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.github		.github
.skaff		.skaff
deployer		deployer
docs		docs
example		example
templates		templates
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
catalog-info.yaml		catalog-info.yaml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

License

artefactory/vertex-pipelines-deployer

Folders and files

Latest commit

History

Repository files navigation

Vertex Pipelines Deployer

Deploy Vertex Pipelines within minutes

❓ Why this tool?

📋 Prerequisites

📦 Installation

From PyPI

From git repo

🚀 Usage

🛠️ Setup

📁 Folder Structure

Pipelines

Configs

Settings

🚀 CLI: Deploying a Pipeline with deploy

✅ CLI: Checking Pipelines are valid with check

🛠️ CLI: Other commands

config

create

init

list

🍭 CLI: Options

Configuration

Repository Structure

About

Resources

License

Stars

Watchers

Forks

Releases 20

Packages 0

Contributors 7

Languages

🚀 CLI: Deploying a Pipeline with `deploy`

✅ CLI: Checking Pipelines are valid with `check`

`config`

`create`

`init`

`list`

Packages