Skip to content

Commit

Permalink
align with core changes (#34)
Browse files Browse the repository at this point in the history
* align with core changes

* add proper branch
  • Loading branch information
avishniakov authored Jan 31, 2024
1 parent c65da87 commit a2e9bc5
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 10 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,5 +56,5 @@ jobs:
with:
stack-name: ${{ matrix.stack-name }}
python-version: ${{ matrix.python-version }}
ref-zenml: ${{ inputs.ref-zenml || 'develop' }}
ref-zenml: ${{ inputs.ref-zenml || 'feature/OSSK-357-deprecate-external-artifact-with-non-value-inputs' }}
ref-template: ${{ inputs.ref-template || github.ref }}
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,18 +274,18 @@ model:

The process of loading data is similar to training, even the same step function is used, but with the `is_inference` flag.

But inference flow has an important difference - there is no need to fit preprocessing sklearn `Pipeline`, rather we need to reuse one fitted during training on the train set, to ensure that the model object gets the expected input. To do so we will use [ExternalArtifact](https://docs.zenml.io/user-guide/advanced-guide/pipelining-features/configure-steps-pipelines#pass-any-kind-of-data-to-your-steps) with lookup by `model_artifact_name` only to get the preprocessing pipeline fitted during the quality-assured training run. This is possible since we configured the batch inference pipeline to run inside a Model Control Plane version context.
But inference flow has an important difference - there is no need to fit preprocessing sklearn `Pipeline`, rather we need to reuse one fitted during training on the train set, to ensure that the model object gets the expected input. To do so we will use the [Model interface](https://docs.zenml.io/user-guide/starter-guide/track-ml-models#configuring-a-model-in-a-pipeline) with lookup by artifact name inside a model context to get the preprocessing pipeline fitted during the quality-assured training run. This is possible since we configured the batch inference pipeline to run inside a Model Control Plane version context.
<details>
<summary>Code snippet 💻</summary>

```python
model = get_pipeline_context().model
########## ETL stage ##########
df_inference, target = data_loader(is_inference=True)
df_inference = inference_data_preprocessor(
dataset_inf=df_inference,
preprocess_pipeline=ExternalArtifact(
model_artifact_name="preprocess_pipeline",
), # this fetches artifact using Model Control Plane
# this fetches artifact using Model Control Plane
preprocess_pipeline=model.get_artifact("preprocess_pipeline"),
target=target,
)
```
Expand All @@ -298,7 +298,7 @@ df_inference = inference_data_preprocessor(

In the drift reporting stage, we will use [standard step](https://docs.zenml.io/stacks-and-components/component-guide/data-validators/evidently#the-evidently-data-validator) `evidently_report_step` to build Evidently report to assess certain data quality metrics. `evidently_report_step` has a number of options, but for this example, we will build only `DataQualityPreset` metrics preset to get a number of NA values in reference and current datasets.

We pass `dataset_trn` from the training pipeline as a `reference_dataset` here. To do so we will use [ExternalArtifact](https://docs.zenml.io/user-guide/advanced-guide/pipelining-features/configure-steps-pipelines#pass-any-kind-of-data-to-your-steps) with lookup by `model_artifact_name` only to get the training dataset used during quality-assured training run. This is possible since we configured the batch inference pipeline to run inside a Model Control Plane version context.
We pass `dataset_trn` from the training pipeline as a `reference_dataset` here. To do so we will use the [Model interface](https://docs.zenml.io/user-guide/starter-guide/track-ml-models#configuring-a-model-in-a-pipeline) with lookup by artifact name inside a model context to get the training dataset used during quality-assured training run. This is possible since we configured the batch inference pipeline to run inside a Model Control Plane version context.

After the report is built we execute another quality gate using the `drift_quality_gate` step, which assesses if a significant drift in the NA count is observed. If so, execution is stopped with an exception.

Expand Down
9 changes: 5 additions & 4 deletions template/pipelines/batch_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
notify_on_failure,
notify_on_success,
)
from zenml import ExternalArtifact, pipeline
from zenml import pipeline, get_pipeline_context
from zenml.integrations.evidently.metrics import EvidentlyMetricConfig
from zenml.integrations.evidently.steps import evidently_report_step
from zenml.logger import get_logger
Expand All @@ -29,21 +29,22 @@ def {{product_name}}_batch_inference():
### ADD YOUR OWN CODE HERE - THIS IS JUST AN EXAMPLE ###
# Link all the steps together by calling them and passing the output
# of one step as the input of the next step.
model = get_pipeline_context().model
########## ETL stage ##########
df_inference, target, _ = data_loader(
random_state=ExternalArtifact(name="random_state"),
random_state=model.get_artifact("random_state"),
is_inference=True
)
df_inference = inference_data_preprocessor(
dataset_inf=df_inference,
preprocess_pipeline=ExternalArtifact(name="preprocess_pipeline"),
preprocess_pipeline=model.get_artifact("preprocess_pipeline"),
target=target,
)

{%- if data_quality_checks %}
########## DataQuality stage ##########
report, _ = evidently_report_step(
reference_dataset=ExternalArtifact(name="dataset_trn"),
reference_dataset=model.get_artifact("dataset_trn"),
comparison_dataset=df_inference,
ignored_cols=["target"],
metrics=[
Expand Down

0 comments on commit a2e9bc5

Please sign in to comment.