Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local development how-to section #3090

Merged
merged 13 commits into from
Oct 17, 2024
20 changes: 20 additions & 0 deletions docs/book/how-to/develop-locally/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
description: Learn how to develop your pipelines locally.
---

# Develop locally

This section contains information around best practices for developing your
pipelines locally. It's common to do at least some work locally where you can
iterate faster, and where it doesn't take much time or money to run your
pipeline. People often do this with a smaller subset of their data, or with
synthetic data.

ZenML supports this pattern and the sections that follow guide you through this
pattern of working locally and then (at certain moments) pushing and running
your pipelines on more powerful remote hardware.

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>


165 changes: 165 additions & 0 deletions docs/book/how-to/develop-locally/keep-your-dashboard-server-clean.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
---
description: Learn how to keep your pipeline runs clean during development.
---

# Keep your dashboard and server clean

When developing pipelines, it's common to run and debug them multiple times. To
avoid cluttering the server with these development runs, ZenML provides several
options:

## Run locally

One of the easiest ways to avoid cluttering a shared server / dashboard is to
disconnect from the server and simply spin up a local server:

```bash
zenml disconnect
zenml up
```

Note that there are some limitations to this approach, particularly if you want
to use remote infrastructure, but if there are local runs that you can do
without the need for remote infrastructure, this can be a quick and easy way to
keep things clean. When you're ready to reconnect to the server to continue with
your shared runs, you can simply run `zenml connect ...` again.

## Pipeline Runs

### Unlisted Runs

Pipeline runs can be created without being explicitly associated with a pipeline by passing the `unlisted` parameter when running a pipeline:

```python
pipeline_instance.run(unlisted=True)
```

Unlisted runs are not displayed on the pipeline's page in the dashboard (though
they *are* displayed in the pipeline run section), keeping the pipeline's
history clean and focused on the pipelines that matter most.

### Deleting Pipeline Runs

If you want to delete a specific pipeline run, you can use a script like this:

```bash
zenml pipeline runs delete <PIPELINE_RUN_NAME_OR_ID>
```

If you want to delete all pipeline runs in the last 24 hours, for example, you
could run a script like this:

```
#!/usr/bin/env python3

import datetime
from zenml.client import Client

def delete_recent_pipeline_runs():
strickvl marked this conversation as resolved.
Show resolved Hide resolved
# Initialize ZenML client
zc = Client()

# Calculate the timestamp for 24 hours ago
twenty_four_hours_ago = datetime.datetime.utcnow() - datetime.timedelta(hours=24)

# Format the timestamp as required by ZenML
time_filter = twenty_four_hours_ago.strftime("%Y-%m-%d %H:%M:%S")

# Get the list of pipeline runs created in the last 24 hours
recent_runs = zc.list_pipeline_runs(created=f"gt:{time_filter}")

# Delete each run
for run in recent_runs:
print(f"Deleting run: {run.id} (Created: {run.body.created})")
zc.delete_pipeline_run(run.id)

print(f"Deleted {len(recent_runs)} pipeline runs.")

if __name__ == "__main__":
delete_recent_pipeline_runs()
```

For different time ranges you can update this as appropriate.

## Pipelines

### Deleting Pipelines

Pipelines that are no longer needed can be deleted using the command:

```bash
zenml pipeline delete <PIPELINE_ID_OR_NAME>
```

This allows you to start fresh with a new pipeline, removing all previous runs
associated with the deleted pipeline. This is a slightly more drastic approach,
but it can sometimes be useful to keep the development environment clean.

## Unique Pipeline Names

Pipelines can be given unique names each time they are run to uniquely identify them. This helps differentiate between multiple iterations of the same pipeline during development.

By default ZenML generates names automatically based on the current date and
time, but you can pass in a `run_name` when defining the pipeline:

```python
training_pipeline = training_pipeline.with_options(
run_name="custom_pipeline_run_name"
)
training_pipeline()
```

Note that pipeline names must be unique. For more information on this feature,
see the [documentation on naming pipeline runs](../build-pipelines/name-your-pipeline-and-runs.md).

## Models

Models are something that you have to explicitly register or pass in as you
define your pipeline, so to run a pipeline without it being attached to a model
is fairly straightforward: simply don't do the things specified in our
[documentation on registering
models](../use-the-model-control-plane/register-a-model.md).

In order to delete a model or a specific model version, you can use the CLI or
Python SDK to accomplish this. As an example, to delete all versions of a model,
you can use:

```bash
zenml model delete <MODEL_NAME>
```

See the full documentation on [how to delete models](../use-the-model-control-plane/delete-a-model.md).

## Artifacts

### Pruning artifacts

If you want to delete artifacts that are no longer referenced by any pipeline
runs, you can use the following CLI command:

```bash
zenml artifact prune
```

By default, this method deletes artifacts physically from the underlying artifact store AND also the entry in the database. You can control this behavior by using the `--only-artifact` and `--only-metadata` flags.

For more information, see the [documentation for this artifact pruning feature](../handle-data-artifacts/delete-an-artifact.md).

## Cleaning your environment

As a more drastic measure, the `zenml clean` command can be used to start from
strickvl marked this conversation as resolved.
Show resolved Hide resolved
scratch on your local machine. This will:

- delete all pipelines, pipeline runs and associated metadata
- delete all artifacts

There is also a `--local` flag that you can set if you want to delete local
files relating to the active stack. Note that `zenml clean` does not delete
artifacts and pipelines on the server; it only deletes the local data and metadata.

By utilizing these options, you can maintain a clean and organized pipeline
dashboard, focusing on the runs that matter most for your project.
<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>


138 changes: 138 additions & 0 deletions docs/book/how-to/develop-locally/local-prod-pipeline-variants.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
description: Create different variants of your pipeline for local development and production.
---

# Create pipeline variants for local development and production

When developing ZenML pipelines, it's often beneficial to have different variants of your pipeline for local development and production environments. This approach allows you to iterate quickly during development while maintaining a full-scale setup for production. While configuration files are one way to achieve this, you can also implement this directly in your code.

There are several ways to create different variants of your pipeline:

1. Using configuration files
2. Implementing variants in code
strickvl marked this conversation as resolved.
Show resolved Hide resolved
3. Using environment variables

Let's explore each of these methods:

## Using configuration files

ZenML allows you to specify pipeline and step configurations using YAML files. Here's an example:

```yaml
enable_cache: False
parameters:
dataset_name: "small_dataset"
steps:
load_data:
enable_cache: False
```

This config file sets up a development variant by using a smaller dataset and disabling caching.

To apply this configuration to your pipeline, use the `with_options(config_path=<PATH_TO_CONFIG>)` pattern:

```python
from zenml import step, pipeline

@step
def load_data(dataset_name: str) -> dict:
...

@pipeline
def ml_pipeline(dataset_name: str):
load_data(dataset_name)

if __name__ == "__main__":
ml_pipeline.with_options(config_path="path/to/config.yaml")()
```

You can create separate configuration files for development and production:

- `config_dev.yaml`: Configuration for local development
- `config_prod.yaml`: Configuration for production

## Implementing variants in code

You can also create pipeline variants directly in your code:

```python
import os
from zenml import step, pipeline

@step
def load_data(dataset_name: str) -> dict:
# Load data based on the dataset name
...

@pipeline
def ml_pipeline(is_dev: bool = False):
dataset = "small_dataset" if is_dev else "full_dataset"
load_data(dataset)

if __name__ == "__main__":
is_dev = os.environ.get("ZENML_ENVIRONMENT") == "dev"
ml_pipeline(is_dev=is_dev)
```

This approach allows you to switch between development and production variants using a simple boolean flag.

## Using environment variables

You can use environment variables to determine which variant to run:

```python
import os

if os.environ.get("ZENML_ENVIRONMENT") == "dev":
config_path = "config_dev.yaml"
else:
config_path = "config_prod.yaml"

ml_pipeline.with_options(config_path=config_path)()
```

Run your pipeline with: `ZENML_ENVIRONMENT=dev python run.py` or `ZENML_ENVIRONMENT=prod python run.py`.

## Development variant considerations

When creating a development variant of your pipeline, consider optimizing these
aspects for faster iteration and debugging:

- Use smaller datasets for quicker runs
- Specify a local stack for execution
- Reduce number of training epochs
- Decrease batch size
- Use a smaller base model

For example, in a configuration file:

```yaml
parameters:
dataset_path: "data/small_dataset.csv"
epochs: 1
batch_size: 16
stack: local_stack
```

Or in code:

```python
@pipeline
def ml_pipeline(is_dev: bool = False):
dataset = "data/small_dataset.csv" if is_dev else "data/full_dataset.csv"
epochs = 1 if is_dev else 100
batch_size = 16 if is_dev else 64

load_data(dataset)
train_model(epochs=epochs, batch_size=batch_size)
```

By creating different variants of your pipeline, you can quickly test and debug
your code locally with a lightweight setup, while maintaining a full-scale
configuration for production execution. This approach streamlines your
development workflow and allows for efficient iteration without compromising
your production pipeline.
<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>


5 changes: 4 additions & 1 deletion docs/book/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@
* [Schedule a pipeline](how-to/build-pipelines/schedule-a-pipeline.md)
* [Deleting a pipeline](how-to/build-pipelines/delete-a-pipeline.md)
* [Compose pipelines](how-to/build-pipelines/compose-pipelines.md)
* [Dynamically generate steps and artifacts](how-to/build-pipelines/dynamically-generate-steps-artifacts.md)
* [Dynamically assign artifact names](how-to/build-pipelines/dynamically-assign-artifact-names.md)
* [Automatically retry steps](how-to/build-pipelines/retry-steps.md)
* [Run pipelines asynchronously](how-to/build-pipelines/run-pipelines-asynchronously.md)
* [Control execution order of steps](how-to/build-pipelines/control-execution-order-of-steps.md)
Expand Down Expand Up @@ -111,6 +111,9 @@
* [📔 Run remote pipelines from notebooks](how-to/run-remote-steps-and-pipelines-from-notebooks/README.md)
* [Limitations of defining steps in notebook cells](how-to/run-remote-steps-and-pipelines-from-notebooks/limitations-of-defining-steps-in-notebook-cells.md)
* [Run a single step from a notebook](how-to/run-remote-steps-and-pipelines-from-notebooks/run-a-single-step-from-a-notebook.md)
* [📍 Develop locally](how-to/develop-locally/README.md)
* [Use config files to develop locally](how-to/develop-locally/local-prod-pipeline-variants.md)
* [Keep your pipelines and dashboard clean](how-to/develop-locally/keep-your-dashboard-server-clean.md)
* [⚒️ Manage stacks & components](how-to/stack-deployment/README.md)
* [Deploy a cloud stack with ZenML](how-to/stack-deployment/deploy-a-cloud-stack.md)
* [Deploy a cloud stack with Terraform](how-to/stack-deployment/deploy-a-cloud-stack-with-terraform.md)
Expand Down