zenml-io · strickvl · Oct 17, 2024 · Oct 15, 2024 · Oct 16, 2024 · Oct 16, 2024
diff --git a/...s/dynamically-generate-steps-artifacts.md → ...ines/dynamically-assign-artifact-names.md b/...s/dynamically-generate-steps-artifacts.md → ...ines/dynamically-assign-artifact-names.md
diff --git a/docs/book/how-to/develop-locally/README.md b/docs/book/how-to/develop-locally/README.md
@@ -0,0 +1,20 @@
+---
+description: Learn how to develop your pipelines locally.
+---
+
+# Develop locally
+
+This section contains information around best practices for developing your
+pipelines locally. It's common to do at least some work locally where you can
+iterate faster, and where it doesn't take much time or money to run your
+pipeline. People often do this with a smaller subset of their data, or with
+synthetic data.
+
+ZenML supports this pattern and the sections that follow guide you through this
+pattern of working locally and then (at certain moments) pushing and running
+your pipelines on more powerful remote hardware.
+
+<!-- For scarf -->
+<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
+
+
diff --git a/docs/book/how-to/develop-locally/keep-your-dashboard-server-clean.md b/docs/book/how-to/develop-locally/keep-your-dashboard-server-clean.md
@@ -0,0 +1,165 @@
+---
+description: Learn how to keep your pipeline runs clean during development.
+---
+
+# Keep your dashboard and server clean
+
+When developing pipelines, it's common to run and debug them multiple times. To
+avoid cluttering the server with these development runs, ZenML provides several
+options:
+
+## Run locally
+
+One of the easiest ways to avoid cluttering a shared server / dashboard is to
+disconnect from the server and simply spin up a local server:
+
+```bash
+zenml disconnect
+zenml up
+```
+
+Note that there are some limitations to this approach, particularly if you want
+to use remote infrastructure, but if there are local runs that you can do
+without the need for remote infrastructure, this can be a quick and easy way to
+keep things clean. When you're ready to reconnect to the server to continue with
+your shared runs, you can simply run `zenml connect ...` again.
+
+## Pipeline Runs
+
+### Unlisted Runs
+
+Pipeline runs can be created without being explicitly associated with a pipeline by passing the `unlisted` parameter when running a pipeline:
+
+```python
+pipeline_instance.run(unlisted=True)
+```
+
+Unlisted runs are not displayed on the pipeline's page in the dashboard (though
+they *are* displayed in the pipeline run section), keeping the pipeline's
+history clean and focused on the pipelines that matter most.
+
+### Deleting Pipeline Runs
+
+If you want to delete a specific pipeline run, you can use a script like this:
+
+```bash
+zenml pipeline runs delete <PIPELINE_RUN_NAME_OR_ID>
+```
+
+If you want to delete all pipeline runs in the last 24 hours, for example, you
+could run a script like this:
+
+```
+#!/usr/bin/env python3
+
+import datetime
+from zenml.client import Client
+
+def delete_recent_pipeline_runs():
+    # Initialize ZenML client
+    zc = Client()
+
+    # Calculate the timestamp for 24 hours ago
+    twenty_four_hours_ago = datetime.datetime.utcnow() - datetime.timedelta(hours=24)
+
+    # Format the timestamp as required by ZenML
+    time_filter = twenty_four_hours_ago.strftime("%Y-%m-%d %H:%M:%S")
+
+    # Get the list of pipeline runs created in the last 24 hours
+    recent_runs = zc.list_pipeline_runs(created=f"gt:{time_filter}")
+
+    # Delete each run
+    for run in recent_runs:
+        print(f"Deleting run: {run.id} (Created: {run.body.created})")
+        zc.delete_pipeline_run(run.id)
+
+    print(f"Deleted {len(recent_runs)} pipeline runs.")
+
+if __name__ == "__main__":
+    delete_recent_pipeline_runs()
+```
+
+For different time ranges you can update this as appropriate.
+
+## Pipelines
+
+### Deleting Pipelines
+
+Pipelines that are no longer needed can be deleted using the command:
+
+```bash
+zenml pipeline delete <PIPELINE_ID_OR_NAME>
+```
+
+This allows you to start fresh with a new pipeline, removing all previous runs
+associated with the deleted pipeline. This is a slightly more drastic approach,
+but it can sometimes be useful to keep the development environment clean.
+
+## Unique Pipeline Names
+
+Pipelines can be given unique names each time they are run to uniquely identify them. This helps differentiate between multiple iterations of the same pipeline during development.
+
+By default ZenML generates names automatically based on the current date and
+time, but you can pass in a `run_name` when defining the pipeline:
+
+```python
+training_pipeline = training_pipeline.with_options(
+    run_name="custom_pipeline_run_name"
+)
+training_pipeline()
+```
+
+Note that pipeline names must be unique. For more information on this feature,
+see the [documentation on naming pipeline runs](../build-pipelines/name-your-pipeline-and-runs.md).
+
+## Models
+
+Models are something that you have to explicitly register or pass in as you
+define your pipeline, so to run a pipeline without it being attached to a model
+is fairly straightforward: simply don't do the things specified in our
+[documentation on registering
+models](../use-the-model-control-plane/register-a-model.md).
+
+In order to delete a model or a specific model version, you can use the CLI or
+Python SDK to accomplish this. As an example, to delete all versions of a model,
+you can use:
+
+```bash
+zenml model delete <MODEL_NAME>
+```
+
+See the full documentation on [how to delete models](../use-the-model-control-plane/delete-a-model.md).
+
+## Artifacts
+
+### Pruning artifacts
+
+If you want to delete artifacts that are no longer referenced by any pipeline
+runs, you can use the following CLI command:
+
+```bash
+zenml artifact prune
+```
+
+By default, this method deletes artifacts physically from the underlying artifact store AND also the entry in the database. You can control this behavior by using the `--only-artifact` and `--only-metadata` flags.
+
+For more information, see the [documentation for this artifact pruning feature](../handle-data-artifacts/delete-an-artifact.md).
+
+## Cleaning your environment
+
+As a more drastic measure, the `zenml clean` command can be used to start from
+scratch on your local machine. This will:
+
+- delete all pipelines, pipeline runs and associated metadata
+- delete all artifacts
+
+There is also a `--local` flag that you can set if you want to delete local
+files relating to the active stack. Note that `zenml clean` does not delete
+artifacts and pipelines on the server; it only deletes the local data and metadata.
+
+By utilizing these options, you can maintain a clean and organized pipeline
+dashboard, focusing on the runs that matter most for your project.
+<!-- For scarf -->
+<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
+
+
diff --git a/docs/book/how-to/develop-locally/local-prod-pipeline-variants.md b/docs/book/how-to/develop-locally/local-prod-pipeline-variants.md
@@ -0,0 +1,138 @@
+---
+description: Create different variants of your pipeline for local development and production.
+---
+
+# Create pipeline variants for local development and production
+
+When developing ZenML pipelines, it's often beneficial to have different variants of your pipeline for local development and production environments. This approach allows you to iterate quickly during development while maintaining a full-scale setup for production. While configuration files are one way to achieve this, you can also implement this directly in your code.
+
+There are several ways to create different variants of your pipeline:
+
+1. Using configuration files
+2. Implementing variants in code
+3. Using environment variables
+
+Let's explore each of these methods:
+
+## Using configuration files
+
+ZenML allows you to specify pipeline and step configurations using YAML files. Here's an example:
+
+```yaml
+enable_cache: False
+parameters:
+    dataset_name: "small_dataset"
+steps:
+    load_data:
+        enable_cache: False
+```
+
+This config file sets up a development variant by using a smaller dataset and disabling caching.
+
+To apply this configuration to your pipeline, use the `with_options(config_path=<PATH_TO_CONFIG>)` pattern:
+
+```python
+from zenml import step, pipeline
+
+@step
+def load_data(dataset_name: str) -> dict:
+    ...
+
+@pipeline
+def ml_pipeline(dataset_name: str):
+    load_data(dataset_name)
+
+if __name__ == "__main__":
+    ml_pipeline.with_options(config_path="path/to/config.yaml")()
+```
+
+You can create separate configuration files for development and production:
+
+- `config_dev.yaml`: Configuration for local development
+- `config_prod.yaml`: Configuration for production
+
+## Implementing variants in code
+
+You can also create pipeline variants directly in your code:
+
+```python
+import os
+from zenml import step, pipeline
+
+@step
+def load_data(dataset_name: str) -> dict:
+    # Load data based on the dataset name
+    ...
+
+@pipeline
+def ml_pipeline(is_dev: bool = False):
+    dataset = "small_dataset" if is_dev else "full_dataset"
+    load_data(dataset)
+
+if __name__ == "__main__":
+    is_dev = os.environ.get("ZENML_ENVIRONMENT") == "dev"
+    ml_pipeline(is_dev=is_dev)
+```
+
+This approach allows you to switch between development and production variants using a simple boolean flag.
+
+## Using environment variables
+
+You can use environment variables to determine which variant to run:
+
+```python
+import os
+
+if os.environ.get("ZENML_ENVIRONMENT") == "dev":
+    config_path = "config_dev.yaml"
+else:
+    config_path = "config_prod.yaml"
+
+ml_pipeline.with_options(config_path=config_path)()
+```
+
+Run your pipeline with: `ZENML_ENVIRONMENT=dev python run.py` or `ZENML_ENVIRONMENT=prod python run.py`.
+
+## Development variant considerations
+
+When creating a development variant of your pipeline, consider optimizing these
+aspects for faster iteration and debugging:
+
+- Use smaller datasets for quicker runs
+- Specify a local stack for execution
+- Reduce number of training epochs
+- Decrease batch size
+- Use a smaller base model
+
+For example, in a configuration file:
+
+```yaml
+parameters:
+    dataset_path: "data/small_dataset.csv"
+epochs: 1
+batch_size: 16
+stack: local_stack
+```
+
+Or in code:
+
+```python
+@pipeline
+def ml_pipeline(is_dev: bool = False):
+    dataset = "data/small_dataset.csv" if is_dev else "data/full_dataset.csv"
+    epochs = 1 if is_dev else 100
+    batch_size = 16 if is_dev else 64
+
+    load_data(dataset)
+    train_model(epochs=epochs, batch_size=batch_size)
+```
+
+By creating different variants of your pipeline, you can quickly test and debug
+your code locally with a lightweight setup, while maintaining a full-scale
+configuration for production execution. This approach streamlines your
+development workflow and allows for efficient iteration without compromising
+your production pipeline.
+<!-- For scarf -->
+<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
+
+
diff --git a/docs/book/toc.md b/docs/book/toc.md
@@ -77,7 +77,7 @@
   * [Schedule a pipeline](how-to/build-pipelines/schedule-a-pipeline.md)
   * [Deleting a pipeline](how-to/build-pipelines/delete-a-pipeline.md)
   * [Compose pipelines](how-to/build-pipelines/compose-pipelines.md)
-  * [Dynamically generate steps and artifacts](how-to/build-pipelines/dynamically-generate-steps-artifacts.md)
+  * [Dynamically assign artifact names](how-to/build-pipelines/dynamically-assign-artifact-names.md)
   * [Automatically retry steps](how-to/build-pipelines/retry-steps.md)
   * [Run pipelines asynchronously](how-to/build-pipelines/run-pipelines-asynchronously.md)
   * [Control execution order of steps](how-to/build-pipelines/control-execution-order-of-steps.md)
@@ -111,6 +111,9 @@
 * [📔 Run remote pipelines from notebooks](how-to/run-remote-steps-and-pipelines-from-notebooks/README.md)
   * [Limitations of defining steps in notebook cells](how-to/run-remote-steps-and-pipelines-from-notebooks/limitations-of-defining-steps-in-notebook-cells.md)
   * [Run a single step from a notebook](how-to/run-remote-steps-and-pipelines-from-notebooks/run-a-single-step-from-a-notebook.md)
+* [📍 Develop locally](how-to/develop-locally/README.md)
+  * [Use config files to develop locally](how-to/develop-locally/local-prod-pipeline-variants.md)
+  * [Keep your pipelines and dashboard clean](how-to/develop-locally/keep-your-dashboard-server-clean.md)
 * [⚒️ Manage stacks & components](how-to/stack-deployment/README.md)
   * [Deploy a cloud stack with ZenML](how-to/stack-deployment/deploy-a-cloud-stack.md)
   * [Deploy a cloud stack with Terraform](how-to/stack-deployment/deploy-a-cloud-stack-with-terraform.md)