Rework template to better use MCP capabilities (#17)

* hide mlflow complexity and show MCP more * hide mlflow complexity and show MCP more * brush up * black * add comments * fix renaming issue * more MCP * update docstring * update readme * get rid of deployer * don't rely on MR versions anymore * rename * rename * typo * use step level configs in yaml * move configs around * typo * add deployment pipeline * black a bit * update readme * fix naming issue * fix template * fix template * try fix tests for macos * try fix tests for macos * add cool pics * add image optimizer here * Optimised images with calibre/image-actions --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
zenml-io · Nov 10, 2023 · 859fc4c · 859fc4c
1 parent 4e614bd
commit 859fc4c
Show file tree

Hide file tree

Showing 44 changed files with 679 additions and 466 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -27,6 +27,8 @@ jobs:
       ZENML_DEBUG: true
       ZENML_ANALYTICS_OPT_IN: false
       ZENML_LOGGING_VERBOSITY: INFO
+      # fork fix for macos
+      OBJC_DISABLE_INITIALIZE_FORK_SAFETY: YES
     steps:
       - name: Check out repository code
         uses: actions/checkout@v3

diff --git a/.github/workflows/image-optimizer.yml b/.github/workflows/image-optimizer.yml
@@ -0,0 +1,26 @@
+name: Compress Images
+on:
+  pull_request:
+    # Run Image Actions when JPG, JPEG, PNG or WebP files are added or changed.
+    # See https://help.github.com/en/actions/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#onpushpull_requestpaths for reference.
+    paths:
+      - '**.jpg'
+      - '**.jpeg'
+      - '**.png'
+      - '**.webp'
+jobs:
+  build:
+    # Only run on non-draft PRs within the same repository.
+    if: github.event.pull_request.head.repo.full_name == github.repository && github.event.pull_request.draft == false
+    name: calibreapp/image-actions
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout Repo
+        uses: actions/checkout@v3
+
+      - name: Compress Images
+        uses: calibreapp/image-actions@main
+        with:
+          # The `GITHUB_TOKEN` is automatically generated by GitHub and scoped only to the repository that is currently running the action. By default, the action can’t update Pull Requests initiated from forked repositories.
+          # See https://docs.github.com/en/actions/reference/authentication-in-a-workflow and https://help.github.com/en/articles/virtual-environments-for-github-actions#token-permissions
+          githubToken: ${{ secrets.GITHUB_TOKEN }}
diff --git a/README.md b/README.md
diff --git a/assets/01_etl.png b/assets/01_etl.png
diff --git a/assets/02_hp.png b/assets/02_hp.png
diff --git a/assets/03_train.png b/assets/03_train.png
diff --git a/assets/04_promotion.png b/assets/04_promotion.png
diff --git a/assets/05_batch_inference.png b/assets/05_batch_inference.png
diff --git a/assets/05_deployment.png b/assets/05_deployment.png
diff --git a/assets/06_batch_inference.png b/assets/06_batch_inference.png
diff --git a/template/.assets/00_pipelines_composition.png b/template/.assets/00_pipelines_composition.png
diff --git a/template/README.md b/template/README.md
@@ -104,19 +104,21 @@ This template uses
 to demonstrate how to perform major critical steps for Continuous Training (CT)
 and Continuous Delivery (CD).
 
-It consists of two pipelines with the following high-level setup:
+It consists of three pipelines with the following high-level setup:
 <p align="center">
-  <img height=300 src=".assets/00_pipelines_composition.png">
+  <img height=800 src=".assets/00_pipelines_composition.png">
 </p>
 
-Both pipelines are inside a shared Model Control Plane model context - training pipeline creates and promotes new Model Control Plane version and inference pipeline is reading from inference Model Control Plane version. This makes those pipelines closely connected, while ensuring that only quality-assured Model Control Plane versions are used to produce predictions delivered to stakeholders.
+All pipelines are leveraging the Model Control Plane to bring all parts together - the training pipeline creates and promotes a new Model Control Plane version with a trained model object in it, deployment pipeline uses the inference Model Control Plane version (the one promoted during training) to create a deployment service and inference pipeline using deployment service from the inference Model Control Plane version and store back new set of predictions as a versioned data artifact for future use. This makes those pipelines closely connected while ensuring that only quality-assured Model Control Plane versions are used to produce predictions delivered to stakeholders.
 * [CT] Training
   * Load, split, and preprocess the training dataset
   * Search for an optimal model object architecture and tune its hyperparameters
   * Train the model object and evaluate its performance on the holdout set
   * Compare a recently trained model object with one promoted earlier
   * If a recently trained model object performs better - stage it as a new inference model object in model registry
   * On success of the current model object - stage newly created Model Control Plane version as the one used for inference
+* [CD] Deployment
+  * Deploy a new prediction service based on the model object connected to the inference Model Control Plane version.
 * [CD] Batch Inference
   * Load the inference dataset and preprocess it reusing object fitted during training
   * Perform data drift analysis reusing training dataset of the inference Model Control Plane version as a reference
@@ -142,23 +144,27 @@ The project loosely follows [the recommended ZenML project structure](https://do
 
 ```
 .
-├── pipelines               # `zenml.pipeline` implementations
-│   ├── batch_inference.py  # [CD] Batch Inference pipeline
-│   └── training.py         # [CT] Training Pipeline
-├── steps                   # logically grouped `zenml.steps` implementations
-│   ├── alerts              # alert developer on pipeline status
-│   ├── data_quality        # quality gates built on top of drift report
-│   ├── etl                 # ETL logic for dataset
-│   ├── hp_tuning           # tune hyperparameters and model architectures
-│   ├── inference           # inference on top of the model from the registry
-│   ├── promotion           # find if a newly trained model will be new inference
-│   └── training            # train and evaluate model
-├── utils                   # helper functions
+├── configs                   # pipelines configuration files
+│   ├── deployer_config.yaml  # the configuration of the deployment pipeline
+│   ├── inference_config.yaml # the configuration of the batch inference pipeline
+│   └── train_config.yaml     # the configuration of the training pipeline
+├── pipelines                 # `zenml.pipeline` implementations
+│   ├── batch_inference.py    # [CD] Batch Inference pipeline
+│   ├── deployment.py         # [CD] Deployment pipeline
+│   └── training.py           # [CT] Training Pipeline
+├── steps                     # logically grouped `zenml.steps` implementations
+│   ├── alerts                # alert developer on pipeline status
+│   ├── deployment            # deploy trained model objects
+│   ├── data_quality          # quality gates built on top of drift report
+│   ├── etl                   # ETL logic for dataset
+│   ├── hp_tuning             # tune hyperparameters and model architectures
+│   ├── inference             # inference on top of the model from the registry
+│   ├── promotion             # find if a newly trained model will be new inference
+│   └── training              # train and evaluate model
+├── utils                     # helper functions
 ├── .dockerignore
-├── inference_config.yaml   # the configuration of the batch inference pipeline
-├── Makefile                # helper scripts for quick start with integrations
-├── README.md               # this file
-├── requirements.txt        # extra Python dependencies 
-├── run.py                  # CLI tool to run pipelines on ZenML Stack
-└── train_config.yaml       # the configuration of the training pipeline
+├── Makefile                  # helper scripts for quick start with integrations
+├── README.md                 # this file
+├── requirements.txt          # extra Python dependencies 
+└── run.py                    # CLI tool to run pipelines on ZenML Stack
 ```
diff --git a/template/configs/deployer_config.yaml b/template/configs/deployer_config.yaml
@@ -0,0 +1,31 @@
+# {% include 'template/license_header' %}
+
+# environment configuration
+settings:
+  docker:
+    required_integrations:
+      - aws
+{%- if data_quality_checks %}
+      - evidently
+{%- endif %}
+      - kubeflow
+      - kubernetes
+      - mlflow
+      - sklearn
+      - slack
+
+# configuration of steps  
+steps:
+  notify_on_success:
+    parameters:
+      notify_on_success: False
+
+# configuration of the Model Control Plane
+model_config:
+  name: {{ product_name }}
+  version: {{ target_environment }}
+
+# pipeline level extra configurations
+extra:
+  notify_on_failure: True
+
diff --git a/template/inference_config.yaml → template/configs/inference_config.yaml b/template/inference_config.yaml → template/configs/inference_config.yaml
@@ -1,5 +1,6 @@
 # {% include 'template/license_header' %}
 
+# environment configuration
 settings:
   docker:
     required_integrations:
@@ -12,15 +13,19 @@ settings:
       - mlflow
       - sklearn
       - slack
-extra:
-  mlflow_model_name: {{ product_name }}
-{%- if target_environment == 'production' %}
-  target_env: Production
-{%- else %}
-  target_env: Staging
-{%- endif %}
-  notify_on_success: False
-  notify_on_failure: True
+
+# configuration of steps  
+steps:
+  notify_on_success:
+    parameters:
+      notify_on_success: False
+
+# configuration of the Model Control Plane
 model_config:
   name: {{ product_name }}
   version: {{ target_environment }}
+
+# pipeline level extra configurations
+extra:
+  notify_on_failure: True
+
diff --git a/template/train_config.yaml → template/configs/train_config.yaml b/template/train_config.yaml → template/configs/train_config.yaml
@@ -1,5 +1,6 @@
 # {% include 'template/license_header' %}
 
+# environment configuration
 settings:
   docker:
     required_integrations:
@@ -12,18 +13,53 @@ settings:
       - mlflow
       - sklearn
       - slack
-extra:
-  mlflow_model_name: {{ product_name }}
-{%- if target_environment == 'production' %}
-  target_env: Production
+
+# configuration of steps  
+steps:
+  model_trainer:
+    parameters:
+      name: {{ product_name }}
+{%- if metric_compare_promotion %}
+  compute_performance_metrics_on_current_data:
+    parameters:
+      target_env: {{ target_environment }}
+  promote_with_metric_compare:
 {%- else %}
-  target_env: Staging
+  promote_latest_version:
 {%- endif %}
-  notify_on_success: False
+    parameters:
+      mlflow_model_name: {{ product_name }}
+      target_env: {{ target_environment }}
+  notify_on_success:
+    parameters:
+      notify_on_success: False
+
+# configuration of the Model Control Plane
+model_config:
+  name: {{ product_name }}
+  license: {{ open_source_license }}
+  description: {{ product_name }} E2E Batch Use Case
+  audience: All ZenML users
+  use_cases: |
+    The {{project_name}} project demonstrates how the most important steps of 
+    the ML Production Lifecycle can be implemented in a reusable way remaining 
+    agnostic to the underlying infrastructure, and shows how to integrate them together 
+    into pipelines for Training and Batch Inference purposes.
+  ethics: No impact.
+  tags:
+  - e2e
+  - batch
+  - sklearn
+  - from template
+  - ZenML delivered
+  create_new_model_version: true
+
+# pipeline level extra configurations
+extra:
   notify_on_failure: True
 {%- if hyperparameters_tuning %}
-  # This set contains all the models that you want to evaluate
-  # during hyperparameter tuning stage.
+  # This set contains all the model configurations that you want 
+  # to evaluate during hyperparameter tuning stage.
   model_search_space:
     random_forest:
       model_package: sklearn.ensemble
@@ -67,30 +103,12 @@ extra:
             start: 1
             end: 10
 {%- else %}
-    # This model configuration will be used for the training stage.
+  # This model configuration will be used for the training stage.
   model_configuration:
     model_package: sklearn.tree
     model_class: DecisionTreeClassifier
     params:
       criterion: gini
       max_depth: 5
       min_samples_leaf: 3
-{%- endif %}
-model_config:
-  name: {{ product_name }}
-  license: {{ open_source_license }}
-  description: {{ product_name }} E2E Batch Use Case
-  audience: All ZenML users
-  use_cases: |
-    The {{project_name}} project demonstrates how the most important steps of 
-    the ML Production Lifecycle can be implemented in a reusable way remaining 
-    agnostic to the underlying infrastructure, and shows how to integrate them together 
-    into pipelines for Training and Batch Inference purposes.
-  ethics: No impact.
-  tags:
-  - e2e
-  - batch
-  - sklearn
-  - from template
-  - ZenML delivered
-  create_new_model_version: true
+{%- endif %}
diff --git a/template/pipelines/__init__.py b/template/pipelines/__init__.py
@@ -3,3 +3,4 @@
 
 from .batch_inference import {{product_name}}_batch_inference
 from .training import {{product_name}}_training
+from .deployment import {{product_name}}_deployment
diff --git a/template/pipelines/batch_inference.py b/template/pipelines/batch_inference.py
@@ -1,23 +1,18 @@
 # {% include 'template/license_header' %}
 
-
 from steps import (
     data_loader,
 {%- if data_quality_checks %}
     drift_quality_gate,
 {%- endif %}
     inference_data_preprocessor,
-    inference_get_current_version,
     inference_predict,
     notify_on_failure,
     notify_on_success,
 )
-from zenml import get_pipeline_context, pipeline
+from zenml import pipeline
 from zenml.integrations.evidently.metrics import EvidentlyMetricConfig
 from zenml.integrations.evidently.steps import evidently_report_step
-from zenml.integrations.mlflow.steps.mlflow_deployer import (
-    mlflow_model_registry_deployer_step,
-)
 from zenml.logger import get_logger
 from zenml.artifacts.external_artifact import ExternalArtifact
 
@@ -36,7 +31,13 @@ def {{product_name}}_batch_inference():
     # Link all the steps together by calling them and passing the output
     # of one step as the input of the next step.
     ########## ETL stage  ##########
-    df_inference, target = data_loader(is_inference=True)
+    df_inference, target, _ = data_loader(
+        random_state=ExternalArtifact(
+            model_artifact_pipeline_name="{{product_name}}_training",
+            model_artifact_name="random_state",
+        ),
+        is_inference=True
+    )
     df_inference = inference_data_preprocessor(
         dataset_inf=df_inference,
         preprocess_pipeline=ExternalArtifact(
@@ -60,15 +61,7 @@ def {{product_name}}_batch_inference():
     drift_quality_gate(report)
 {%- endif %}
     ########## Inference stage  ##########
-    deployment_service = mlflow_model_registry_deployer_step(
-        registry_model_name=get_pipeline_context().extra["mlflow_model_name"],
-        registry_model_version=ExternalArtifact(
-            model_artifact_name="promoted_version",
-        ),
-        replace_existing=True,
-    )
     inference_predict(
-        deployment_service=deployment_service,
         dataset_inf=df_inference,
 {%- if data_quality_checks %}
         after=["drift_quality_gate"],

diff --git a/template/pipelines/deployment.py b/template/pipelines/deployment.py
@@ -0,0 +1,22 @@
+# {% include 'template/license_header' %}
+
+from steps import deployment_deploy,notify_on_success,notify_on_failure
+
+from zenml import pipeline
+
+
+@pipeline(on_failure=notify_on_failure)
+def {{product_name}}_deployment():
+    """
+    Model deployment pipeline.
+
+    This is a pipeline deploys trained model for future inference.
+    """
+    ### ADD YOUR OWN CODE HERE - THIS IS JUST AN EXAMPLE ###
+    # Link all the steps together by calling them and passing the output
+    # of one step as the input of the next step.
+    ########## Deployment stage ##########
+    deployment_deploy()
+
+    notify_on_success(after=["deployment_deploy"])
+    ### YOUR CODE ENDS HERE ###
Original file line number	Diff line number	Diff line change
Expand Up		@@ -3,3 +3,4 @@

		from .batch_inference import {{product_name}}_batch_inference
		from .training import {{product_name}}_training
		from .deployment import {{product_name}}_deployment