Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local development how-to section #3090

Merged
merged 13 commits into from
Oct 17, 2024
strickvl marked this conversation as resolved.
Show resolved Hide resolved
File renamed without changes.
16 changes: 16 additions & 0 deletions docs/book/how-to/develop-locally/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
description: Learn how to develop your pipelines locally.
---

# Develop locally

This section contains information around best practices for developing your
pipelines locally. It's common to do at least some work locally where you can
iterate faster, and where it doesn't take much time or money to run your
pipeline. People often do this with a smaller subset of your data, or with
strickvl marked this conversation as resolved.
Show resolved Hide resolved
synthetic data.

ZenML supports this pattern and the sections that follow guide you through this
pattern of working locally and then (at certain moments) pushing and running
your pipelines on more powerful remote hardware.

Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
description: Learn how to keep your pipeline runs clean during development.
strickvl marked this conversation as resolved.
Show resolved Hide resolved
---

# Keep your pipelines and dashboard clean

When developing pipelines, it's common to run and debug them multiple times. To avoid cluttering the server with these development runs, ZenML provides several options:

strickvl marked this conversation as resolved.
Show resolved Hide resolved
## Unlisted Runs

Pipeline runs can be created without being explicitly associated with a pipeline by passing the `unlisted` parameter when running a pipeline:

```python
pipeline_instance.run(unlisted=True)
```

Unlisted runs are not displayed on the pipeline's page in the dashboard, keeping the pipeline's history clean and focused on important runs.
strickvl marked this conversation as resolved.
Show resolved Hide resolved

## Deleting Pipelines
strickvl marked this conversation as resolved.
Show resolved Hide resolved
strickvl marked this conversation as resolved.
Show resolved Hide resolved

Pipelines can be deleted and recreated using the command:
strickvl marked this conversation as resolved.
Show resolved Hide resolved

```bash
zenml pipeline delete <PIPELINE_ID_OR_NAME>
```

This allows you to start fresh with a new pipeline, removing all previous runs
associated with the deleted pipeline. This is a slightly more drastic approach,
but it can sometimes be useful to keep the development environment clean.

## Unique Pipeline Names
strickvl marked this conversation as resolved.
Show resolved Hide resolved

Pipelines can be given unique names each time they are run to uniquely identify them. This helps differentiate between multiple iterations of the same pipeline during development.
strickvl marked this conversation as resolved.
Show resolved Hide resolved

By utilizing these options, you can maintain a clean and organized pipeline dashboard, focusing on the runs that matter most for your project.
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
description: Learn how to use configuration files to manage local development.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the moment i hear this, i think "oh its slow" . maybe the intent of the chapter is rather that you want to set up different "modes" of your pipeline that load smaller data in a local case? And config files are sort of one way to do that but you can also do it in code right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So reframe it to talk more about having different "variants" of your pipeline in dev, and production.. maybe some if conditions that loads a different dataset or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

---

# Use configuration files for local development

Configuration files allow you to easily manage and customize your ZenML pipelines for different environments, such as local development vs remote execution. By using YAML config files, you can separate the configuration from your code and easily switch between different setups.
strickvl marked this conversation as resolved.
Show resolved Hide resolved

## YAML config files for local and remote development

ZenML allows you to specify pipeline and step configurations using YAML files. Here's a simple example:

```yaml
enable_cache: False
parameters:
dataset_name: "best_dataset"
steps:
load_data:
enable_cache: False
```

This config file disables caching for the pipeline and the `load_data` step, and sets the `dataset_name` parameter to `"best_dataset"`.

To apply this configuration to your pipeline, use the `with_options(config_path=<PATH_TO_CONFIG>)` pattern:

```python
from zenml import step, pipeline

@step
def load_data(dataset_name: str) -> dict:
...

@pipeline
def simple_ml_pipeline(dataset_name: str):
load_data(dataset_name)

if __name__ == "__main__":
simple_ml_pipeline.with_options(config_path="path/to/config.yaml")()
```

For more details on what can be configured in the YAML file, refer to the [full configuration documentation](https://docs.zenml.io/how-to/use-configuration-files/what-can-be-configured).

To manage configurations for local development vs remote execution, you can create two separate YAML files, one for each environment. For example:

- `config_local.yaml`: Configuration for local development
- `config_remote.yaml`: Configuration for remote execution

Then in your `run.py` script, you can choose which config file to use based on a flag or environment variable:

```python
import os

if os.environ.get("ENVIRONMENT") == "local":
config_path = "config_local.yaml"
else:
config_path = "config_remote.yaml"

simple_ml_pipeline.with_options(config_path=config_path)()
```

And then you could run this with `ENVIRONMENT=local python run.py` or
`ENVIRONMENT=remote python run.py`. (Alternatively, you could use a CLI flag
with `argparse` or `click` instead of an environment variable.)


## Development environment configuration

In your local development config file, you can customize various settings to optimize for faster iteration and debugging. Some things you might want to configure:

- Use smaller datasets for quicker runs
- Specify a local stack for execution
- Reduce number of training epochs
- Decrease batch size
- Use a smaller base model

For example:

```yaml
parameters:
dataset_path: "data/small_dataset.csv"
epochs: 1
batch_size: 16
stack: local_stack
```

This allows you to quickly test and debug your code locally before running it with the full-scale configuration for remote execution.

strickvl marked this conversation as resolved.
Show resolved Hide resolved
By leveraging configuration files, you can easily manage different setups for
your ZenML pipelines and streamline your development workflow.
5 changes: 4 additions & 1 deletion docs/book/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@
* [Schedule a pipeline](how-to/build-pipelines/schedule-a-pipeline.md)
* [Deleting a pipeline](how-to/build-pipelines/delete-a-pipeline.md)
* [Compose pipelines](how-to/build-pipelines/compose-pipelines.md)
* [Dynamically generate steps and artifacts](how-to/build-pipelines/dynamically-generate-steps-artifacts.md)
* [Dynamically assign artifact names](how-to/build-pipelines/dynamically-assign-artifact-names.md)
* [Automatically retry steps](how-to/build-pipelines/retry-steps.md)
* [Run pipelines asynchronously](how-to/build-pipelines/run-pipelines-asynchronously.md)
* [Control execution order of steps](how-to/build-pipelines/control-execution-order-of-steps.md)
Expand Down Expand Up @@ -111,6 +111,9 @@
* [📔 Run remote pipelines from notebooks](how-to/run-remote-steps-and-pipelines-from-notebooks/README.md)
* [Limitations of defining steps in notebook cells](how-to/run-remote-steps-and-pipelines-from-notebooks/limitations-of-defining-steps-in-notebook-cells.md)
* [Run a single step from a notebook](how-to/run-remote-steps-and-pipelines-from-notebooks/run-a-single-step-from-a-notebook.md)
* [📍 Develop locally](how-to/develop-locally/README.md)
* [Use config files to develop locally](how-to/develop-locally/use-config-files-to-develop-locally.md)
* [Keep your pipelines and dashboard clean](how-to/develop-locally/keep-your-pipeline-dashboard-clean.md)
* [⚒️ Manage stacks & components](how-to/stack-deployment/README.md)
* [Deploy a cloud stack with ZenML](how-to/stack-deployment/deploy-a-cloud-stack.md)
* [Deploy a cloud stack with Terraform](how-to/stack-deployment/deploy-a-cloud-stack-with-terraform.md)
Expand Down
Loading