diff --git a/.gitbook.yaml b/.gitbook.yaml index 44f5ace4104..6cdb7e8ac5d 100644 --- a/.gitbook.yaml +++ b/.gitbook.yaml @@ -6,14 +6,14 @@ structure: redirects: how-to/customize-docker-builds/use-code-repositories-to-speed-up-docker-build-times: how-to/customize-docker-builds/how-to-reuse-builds.md - reference/migration-guide/README.md: how-to/manage-the-zenml-server/migration-guide/migration-guide.md - reference/migration-guide/migration-zero-twenty.md: how-to/manage-the-zenml-server/migration-guide/migration-zero-twenty.md - reference/migration-guide/migration-zero-thirty.md: how-to/manage-the-zenml-server/migration-guide/migration-zero-thirty.md - reference/migration-guide/migration-zero-forty.md: how-to/manage-the-zenml-server/migration-guide/migration-zero-forty.md - reference/migration-guide/migration-zero-sixty.md: how-to/manage-the-zenml-server/migration-guide/migration-zero-sixty.md + reference/migration-guide: how-to/manage-the-zenml-server/migration-guide/migration-guide.md + reference/migration-guide/migration-zero-twenty: how-to/manage-the-zenml-server/migration-guide/migration-zero-twenty.md + reference/migration-guide/migration-zero-thirty: how-to/manage-the-zenml-server/migration-guide/migration-zero-thirty.md + reference/migration-guide/migration-zero-forty: how-to/manage-the-zenml-server/migration-guide/migration-zero-forty.md + reference/migration-guide/migration-zero-sixty: how-to/manage-the-zenml-server/migration-guide/migration-zero-sixty.md - getting-started/deploying-zenml/manage-the-deployed-services/upgrade-the-version-of-the-zenml-server.md: how-to/manage-the-zenml-server/upgrade-zenml-server.md - getting-started/deploying-zenml/manage-the-deployed-services/troubleshoot-your-deployed-server.md: how-to/manage-the-zenml-server/troubleshoot-your-deployed-server.md - how-to/stack-deployment/implement-a-custom-integration.md: how-to/contribute-to-zenml/implement-a-custom-integration.md - - getting-started/zenml-pro/system-architectures: getting-started/system-architectures.md \ No newline at end of file + getting-started/deploying-zenml/manage-the-deployed-services/upgrade-the-version-of-the-zenml-server: how-to/manage-the-zenml-server/upgrade-zenml-server.md + getting-started/deploying-zenml/manage-the-deployed-services/troubleshoot-your-deployed-server: how-to/manage-the-zenml-server/troubleshoot-your-deployed-server.md + how-to/stack-deployment/implement-a-custom-integration: how-to/contribute-to-zenml/implement-a-custom-integration.md + how-to/setting-up-a-project-repository/best-practices: how-to/setting-up-a-project-repository/set-up-repository.md + getting-started/zenml-pro/system-architectures: getting-started/system-architectures.md diff --git a/docs/book/.gitbook/assets/argilla_annotator.png b/docs/book/.gitbook/assets/argilla_annotator.png index 62327909f6d..8b01f3744fd 100644 Binary files a/docs/book/.gitbook/assets/argilla_annotator.png and b/docs/book/.gitbook/assets/argilla_annotator.png differ diff --git a/docs/book/.gitbook/assets/data_scientist_connector_role.png b/docs/book/.gitbook/assets/data_scientist_connector_role.png new file mode 100644 index 00000000000..ac19b31fa3d Binary files /dev/null and b/docs/book/.gitbook/assets/data_scientist_connector_role.png differ diff --git a/docs/book/.gitbook/assets/model_pipeline_artifact.png b/docs/book/.gitbook/assets/model_pipeline_artifact.png new file mode 100644 index 00000000000..38296835027 Binary files /dev/null and b/docs/book/.gitbook/assets/model_pipeline_artifact.png differ diff --git a/docs/book/.gitbook/assets/platform_engineer_connector_role.png b/docs/book/.gitbook/assets/platform_engineer_connector_role.png new file mode 100644 index 00000000000..d403c22f52e Binary files /dev/null and b/docs/book/.gitbook/assets/platform_engineer_connector_role.png differ diff --git a/docs/book/how-to/customize-docker-builds/how-to-reuse-builds.md b/docs/book/how-to/customize-docker-builds/how-to-reuse-builds.md index bffce76a90e..9f278b28848 100644 --- a/docs/book/how-to/customize-docker-builds/how-to-reuse-builds.md +++ b/docs/book/how-to/customize-docker-builds/how-to-reuse-builds.md @@ -33,7 +33,7 @@ While reusing Docker builds is useful, it can be limited. This is because specif ## Use the artifact store to upload your code -You can also let ZenML use the artifact store to upload your code. This is the default behaviour if no code repository is detected and the `allow_download_from_artifact_store` flag is not set to `False` in your `DockerSettings`. +You can also let ZenML use the artifact store to upload your code. This is the default behavior if no code repository is detected and the `allow_download_from_artifact_store` flag is not set to `False` in your `DockerSettings`. ## Use code repositories to speed up Docker build times diff --git a/docs/book/how-to/customize-docker-builds/how-to-use-a-private-pypi-repository.md b/docs/book/how-to/customize-docker-builds/how-to-use-a-private-pypi-repository.md new file mode 100644 index 00000000000..344a5b5e891 --- /dev/null +++ b/docs/book/how-to/customize-docker-builds/how-to-use-a-private-pypi-repository.md @@ -0,0 +1,44 @@ +--- +description: How to use a private PyPI repository. +--- + +# How to use a private PyPI repository + +For packages that require authentication, you may need to take additional steps: + +1. Use environment variables to store credentials securely. +2. Configure pip or poetry to use these credentials when installing packages. +3. Consider using custom Docker images that have the necessary authentication setup. + +Here's an example of how you might set up authentication using environment variables: + +```python +import os + +from my_simple_package import important_function +from zenml.config import DockerSettings +from zenml import step, pipeline + +docker_settings = DockerSettings( + requirements=["my-simple-package==0.1.0"], + environment={'PIP_EXTRA_INDEX_URL': f"https://{os.environ.get('PYPI_TOKEN', '')}@my-private-pypi-server.com/{os.environ.get('PYPI_USERNAME', '')}/"} +) + +@step +def my_step(): + return important_function() + +@pipeline(settings={"docker": docker_settings}) +def my_pipeline(): + my_step() + +if __name__ == "__main__": + my_pipeline() +``` + +Note: Be cautious with handling credentials. Always use secure methods to manage +and distribute authentication information within your team. + +
ZenML Scarf
+ + diff --git a/docs/book/how-to/manage-the-zenml-server/best-practices-upgrading-zenml.md b/docs/book/how-to/manage-the-zenml-server/best-practices-upgrading-zenml.md index dc6a397ab4f..c96ba3cf778 100644 --- a/docs/book/how-to/manage-the-zenml-server/best-practices-upgrading-zenml.md +++ b/docs/book/how-to/manage-the-zenml-server/best-practices-upgrading-zenml.md @@ -42,7 +42,7 @@ ZenML Pro comes with multi-tenancy which makes it easy for you to have multiple ## Upgrading your code -Sometimes, you might have to upgrade your code to work with a new version of ZenML. This is true especially when you are moving from a really old version to a new major version. The following tips might help, in addition to everything you've learnt in this document so far. +Sometimes, you might have to upgrade your code to work with a new version of ZenML. This is true especially when you are moving from a really old version to a new major version. The following tips might help, in addition to everything you've learned in this document so far. ### Testing and Compatibility diff --git a/docs/book/how-to/setting-up-a-project-repository/README.md b/docs/book/how-to/setting-up-a-project-repository/README.md index c0cb737edd7..eb203046c2a 100644 --- a/docs/book/how-to/setting-up-a-project-repository/README.md +++ b/docs/book/how-to/setting-up-a-project-repository/README.md @@ -1,13 +1,93 @@ --- -description: Setting your team up for success with a project repository. +description: Setting your team up for success with a well-architected ZenML project. --- -# 😸 Setting up a project repository +# 😸 Setting up a Well-Architected ZenML Project -ZenML code typically lives in a `git` repository. Setting this repository up correctly can make a huge impact on collaboration and -getting the maximum out of your ZenML deployment. This section walks users through some of the options available to create a project -repository with ZenML. +Welcome to the guide on setting up a well-architected ZenML project. This section will provide you with a comprehensive overview of best practices, strategies, and considerations for structuring your ZenML projects to ensure scalability, maintainability, and collaboration within your team. -

A visual representation of how the code repository fits into the general ZenML architecture.

+## The Importance of a Well-Architected Project + +A well-architected ZenML project is crucial for the success of your machine learning operations (MLOps). It provides a solid foundation for your team to develop, deploy, and maintain ML models efficiently. By following best practices and leveraging ZenML's features, you can create a robust and flexible MLOps pipeline that scales with your needs. + +## Key Components of a Well-Architected ZenML Project + +### Repository Structure + +A clean and organized repository structure is essential for any ZenML project. This includes: + +- Proper folder organization for pipelines, steps, and configurations +- Clear separation of concerns between different components +- Consistent naming conventions + +Learn more about setting up your repository in the [Set up repository guide](./best-practices.md). + +### Version Control and Collaboration + +Integrating your ZenML project with version control systems like Git is crucial for team collaboration and code management. This allows for: + +- Makes creating pipeline builds faster, as you can leverage the same image and [have ZenML download code from your repository](../../how-to/customize-docker-builds/how-to-reuse-builds.md#use-code-repositories-to-speed-up-docker-build-times). +- Easy tracking of changes +- Collaboration among team members + +Discover how to connect your Git repository in the [Set up a repository guide](./best-practices.md). + +### Stacks, Pipelines, Models, and Artifacts + +Understanding the relationship between stacks, models, and pipelines is key to designing an efficient ZenML project: + +- Stacks: Define your infrastructure and tool configurations +- Models: Represent your machine learning models and their metadata +- Pipelines: Encapsulate your ML workflows +- Artifacts: Track your data and model outputs + +Learn about organizing these components in the [Organizing Stacks, Pipelines, Models, and Artifacts guide](./stacks-pipelines-models.md). + +### Access Management and Roles + +Proper access management ensures that team members have the right permissions and responsibilities: + +- Define roles such as data scientists, MLOps engineers, and infrastructure managers +- Set up [service connectors](../auth-management/README.md) and manage authorizations +- Establish processes for pipeline maintenance and server upgrades +- Leverage [Teams in ZenML Pro](../../getting-started/zenml-pro/teams.md) to assign roles and permissions to a group of users, to mimic your real-world team roles. + +Explore access management strategies in the [Access Management and Roles guide](./access-management-and-roles.md). + +### Shared Components and Libraries + +Leverage shared components and libraries to promote code reuse and standardization across your team: + +- Custom flavors, steps, and materializers +- Shared private wheels for internal distribution +- Handling authentication for specific libraries + +Find out more about sharing code in the [Shared Libraries and Logic for Teams guide](./shared_components_for_teams.md). + +### Project Templates + +Utilize project templates to kickstart your ZenML projects and ensure consistency: + +- Use pre-made templates for common use cases +- Create custom templates tailored to your team's needs + +Learn about using and creating project templates in the [Project Templates guide](./project-templates.md). + +### Migration and Maintenance + +As your project evolves, you may need to migrate existing codebases or upgrade your ZenML server: + +- Strategies for migrating legacy code to newer ZenML versions +- Best practices for upgrading ZenML servers + +Discover migration strategies and maintenance best practices in the [Migration and Maintenance guide](../../how-to/manage-the-zenml-server/best-practices-upgrading-zenml.md#upgrading-your-code). + +## Getting Started + +To begin building your well-architected ZenML project, start by exploring the guides in this section. Each guide provides in-depth information on specific aspects of project setup and management. + +Remember, a well-architected project is an ongoing process. Regularly review and refine your project structure, processes, and practices to ensure they continue to meet your team's evolving needs. + +By following these guidelines and leveraging ZenML's powerful features, you'll be well on your way to creating a robust, scalable, and collaborative MLOps environment.
ZenML Scarf
diff --git a/docs/book/how-to/setting-up-a-project-repository/access-management.md b/docs/book/how-to/setting-up-a-project-repository/access-management.md new file mode 100644 index 00000000000..8e12187a73c --- /dev/null +++ b/docs/book/how-to/setting-up-a-project-repository/access-management.md @@ -0,0 +1,93 @@ +--- +description: A guide on managing user roles and responsibilities in ZenML. +--- + +# Access Management and Roles in ZenML + +Effective access management is crucial for maintaining security and efficiency in your ZenML projects. This guide will help you understand the different roles within a ZenML server and how to manage access for your team members. + +## Typical Roles in an ML Project + +In an ML project, you will typically have the following roles: + +- Data Scientists: Primarily work on developing and running pipelines. +- MLOps Platform Engineers: Manage the infrastructure and stack components. +- Project Owners: Oversee the entire ZenML deployment and manage user access. + +The above is an estimation of roles that you might have in your team. In your case, the names might be different or there might be more roles, but you can relate the responbilities we discuss in this document to your own project loosely. + +{% hint style="info" %} +You can create [Roles in ZenML Pro](../../getting-started/zenml-pro/roles.md) with a given set of permissions and assign them to either Users or Teams that represent your real-world team structure. Sign up for a free trial to try it yourself: https://cloud.zenml.io/ +{% endhint %} + +## Service Connectors: Gateways to External Services + +Service connectors are how different cloud services are integrated with ZenML. They are used to abstract away the credentials and other configurations needed to access these services. + +Ideally, you would want that only the MLOps Platform Engineers have access for creating and managing connectors. This is because they are closest to your infrastructure and can make informed decisions about what authentication mechanisms to use and more. + +Other team members can use connectors to create stack components that talk to the external services but should not have to worry about setting them and shouldn't have access to the credentials used to configure them. + +Let's look at an example of how this works in practice. +Imagine you have a `DataScientist` role in your ZenML server. This role should only be able to use the connectors to create stack components and run pipelines. They shouldn't have access to the credentials used to configure these connectors. Therefore, the permissions for this role could look like the following: + +![Data Scientist Permissions](../../.gitbook/assets/data_scientist_connector_role.png) + +You can notice that the role doesn't grant the data scientist permissions to create, update, or delete connectors, or read their secret values. + +On the other hand, the `MLOpsPlatformEngineer` role has the permissions to create, update, and delete connectors, as well as read their secret values. The permissions for this role could look like the following: + +![MLOps Platform Engineer Permissions](../../.gitbook/assets/platform_engineer_connector_role.png) + +{% hint style="info" %} +Note that you can only use the RBAC features in ZenML Pro. Learn more about roles in ZenML Pro [here](../../getting-started/zenml-pro/roles.md). +{% endhint %} + +Learn more about the best practices in managing credentials and recommended roles in our [Managing Stacks and Components guide](../stack-deployment/README.md). + + +## Who is responsible for upgrading the ZenML server? + +The decision to upgrade your ZenML server is usually taken by your Project Owners after consulting with all the teams using the server. This is because there might be teams with conflicting requirements and moving to a new version of ZenML (that might come with upgrades to certain libraries) can break code for some users. + +{% hint style="info" %} +You can choose to have different servers for different teams and that can alleviate some of the pressure to upgrade if you have multiple teams using the same server. ZenML Pro offers [multi-tenancy](../../getting-started/zenml-pro/tenants.md) out of the box, for situations like these. Sign up for a free trial to try it yourself: https://cloud.zenml.io/ +{% endhint %} + +Performing the upgrade itself is a task that typically falls on the MLOps Platform Engineers. They should: + +- ensure that all data is backed up before performing the upgrade +- no service disruption or downtime happens during the upgrade + +and more. Read in detail about the best practices for upgrading your ZenML server in the [Best Practices for Upgrading ZenML Servers](../manage-the-zenml-server/best-practices-upgrading-zenml.md) guide. + + +## Who is responsible for migrating and maintaining pipelines? + +When you upgrade to a new version of ZenML, you might have to test if your code works as expected and if the syntax is up to date with what ZenML expects. Although we do our best to make new releases compatible with older versions, there might be some breaking changes that you might have to address. + +The pipeline code itself is typically owned by the Data Scientist, but the Platform Engineer is responsible for making sure that new changes can be tested in a safe environment without impacting existing workflows. This involves setting up a new server and doing a staged upgrade and other strategies. + +The Data Scientist should also check out the release notes, and the migration guide where applicable when upgrading the code. Read more about the best practices for upgrading your ZenML server and your code in the [Best Practices for Upgrading ZenML Servers](../manage-the-zenml-server/best-practices-upgrading-zenml.md) guide. + + +## Best Practices for Access Management + +Apart from the role-specific tasks we discussed so far, there are some general best practices you should follow to ensure a secure and well-managed ZenML environment that supports collaboration while maintaining proper access controls. + +- Regular Audits: Conduct periodic reviews of user access and permissions. +- Role-Based Access Control (RBAC): Implement RBAC to streamline permission management. +- Least Privilege: Grant minimal necessary permissions to each role. +- Documentation: Maintain clear documentation of roles, responsibilities, and access policies. + +{% hint style="info" %} +The Role-Based Access Control (RBAC) and assigning of permissions is only available for ZenML Pro users. +{% endhint %} + +By following these guidelines, you can ensure a secure and well-managed ZenML environment that supports collaboration while maintaining proper access controls. + + + +
ZenML Scarf
+ + diff --git a/docs/book/how-to/setting-up-a-project-repository/connect-your-git-repository.md b/docs/book/how-to/setting-up-a-project-repository/connect-your-git-repository.md index f0eed7a4951..1c18178fc2d 100644 --- a/docs/book/how-to/setting-up-a-project-repository/connect-your-git-repository.md +++ b/docs/book/how-to/setting-up-a-project-repository/connect-your-git-repository.md @@ -8,6 +8,8 @@ description: >- A code repository in ZenML refers to a remote storage location for your code. Some commonly known code repository platforms include [GitHub](https://github.com/) and [GitLab](https://gitlab.com/). +

A visual representation of how the code repository fits into the general ZenML architecture.

+ Code repositories enable ZenML to keep track of the code version that you use for your pipeline runs. Additionally, running a pipeline that is tracked in a registered code repository can [speed up the Docker image building for containerized stack components](../customize-docker-builds/use-code-repositories-to-speed-up-docker-build-times.md) by eliminating the need to rebuild Docker images each time you change one of your source code files. Learn more about how code repositories benefit development [here](../customize-docker-builds/use-code-repositories-to-speed-up-docker-build-times.md). diff --git a/docs/book/how-to/setting-up-a-project-repository/create-your-own-template.md b/docs/book/how-to/setting-up-a-project-repository/create-your-own-template.md new file mode 100644 index 00000000000..710249bc21d --- /dev/null +++ b/docs/book/how-to/setting-up-a-project-repository/create-your-own-template.md @@ -0,0 +1,48 @@ +--- +description: How to create your own ZenML template. +--- + +# Create your own ZenML template + +Creating your own ZenML template is a great way to standardize and share your ML workflows across different projects or teams. ZenML uses [Copier](https://copier.readthedocs.io/en/stable/) to manage its project templates. Copier is a library that allows you to generate projects from templates. It's simple, versatile, and powerful. + +Here's a step-by-step guide on how to create your own ZenML template: + +1. **Create a new repository for your template.** This will be the place where you store all the code and configuration files for your template. +2. **Define your ML workflows as ZenML steps and pipelines.** You can start by copying the code from one of the existing ZenML templates (like the [starter template](https://github.com/zenml-io/template-starter)) and modifying it to fit your needs. +3. **Create a `copier.yml` file.** This file is used by Copier to define the template's parameters and their default values. You can learn more about this config file [in the copier docs](https://copier.readthedocs.io/en/stable/creating/). +4. **Test your template.** You can use the `copier` command-line tool to generate a new project from your template and check if everything works as expected: + +```bash +copier copy https://github.com/your-username/your-template.git your-project +``` + +Replace `https://github.com/your-username/your-template.git` with the URL of your template repository, and `your-project` with the name of the new project you want to create. + +5. **Use your template with ZenML.** Once your template is ready, you can use it with the `zenml init` command: + +```bash +zenml init --template https://github.com/your-username/your-template.git +``` + +Replace `https://github.com/your-username/your-template.git` with the URL of your template repository. + +If you want to use a specific version of your template, you can use the `--template-tag` option to specify the git tag of the version you want to use: + +```bash +zenml init --template https://github.com/your-username/your-template.git --template-tag v1.0.0 +``` + +Replace `v1.0.0` with the git tag of the version you want to use. + +That's it! Now you have your own ZenML project template that you can use to quickly set up new ML projects. Remember to keep your template up-to-date with the latest best practices and changes in your ML workflows. + +Our [Production Guide](../../user-guide/production-guide/README.md) documentation is built around the `E2E Batch` project template codes. Most examples will be based on it, so we highly recommend you to install the `e2e_batch` template with `--template-with-defaults` flag before diving deeper into this documentation section, so you can follow this guide along using your own local environment. + +```bash +mkdir e2e_batch +cd e2e_batch +zenml init --template e2e_batch --template-with-defaults +``` + +
ZenML Scarf
diff --git a/docs/book/how-to/setting-up-a-project-repository/best-practices.md b/docs/book/how-to/setting-up-a-project-repository/set-up-repository.md similarity index 76% rename from docs/book/how-to/setting-up-a-project-repository/best-practices.md rename to docs/book/how-to/setting-up-a-project-repository/set-up-repository.md index b24308c85e1..2cf4a8b952f 100644 --- a/docs/book/how-to/setting-up-a-project-repository/best-practices.md +++ b/docs/book/how-to/setting-up-a-project-repository/set-up-repository.md @@ -2,7 +2,7 @@ description: Recommended repository structure and best practices. --- -# Best practices +# Set up your repository While it doesn't matter how you structure your ZenML project, here is a recommended project structure the core team often uses: @@ -34,7 +34,20 @@ While it doesn't matter how you structure your ZenML project, here is a recommen └── run.py ``` -All ZenML [Project templates](using-project-templates.md#generating-project-from-a-project-template) are modeled around this basic structure. +All ZenML [Project +templates](using-project-templates.md#generating-project-from-a-project-template) +are modeled around this basic structure. The `steps` and `pipelines` folders +contain the steps and pipelines defined in your project. If your project is +simpler you can also just keep your steps at the top level of the `steps` folder +without the need so structure them in subfolders. + +{% hint style="info" %} +It might also make sense to register your repository as a code repository. These +enable ZenML to keep track of the code version that you use for your pipeline +runs. Additionally, running a pipeline that is tracked in [a registered code repository](./connect-your-git-repository.md) can speed up the Docker image building for containerized stack +components by eliminating the need to rebuild Docker images each time you change +one of your source code files. Learn more about these in [connecting your Git repository](https://docs.zenml.io/how-to/setting-up-a-project-repository/connect-your-git-repository). +{% endhint %} #### Steps @@ -87,7 +100,9 @@ Collect all your notebooks in one place. By running `zenml init` at the root of your project, you define the project scope for ZenML. In ZenML terms, this will be called your "source's root". This will be used to resolve import paths and store configurations. -Although this is optional, it is recommended that you do this for all of your projects. +Although this is optional, it is recommended that you do this for all of your +projects. This is especially important if you are using Jupyter noteeboks in +your project as these require you to have initialized a `.zen` file. {% hint style="warning" %} All of your import paths should be relative to the source's root. diff --git a/docs/book/how-to/setting-up-a-project-repository/shared-components-for-teams.md b/docs/book/how-to/setting-up-a-project-repository/shared-components-for-teams.md new file mode 100644 index 00000000000..d19e1d41c7d --- /dev/null +++ b/docs/book/how-to/setting-up-a-project-repository/shared-components-for-teams.md @@ -0,0 +1,138 @@ +--- +description: Sharing code and libraries within teams. +--- + +# Shared Libraries and Logic for Teams + +Teams often need to collaborate on projects, share versioned logic, and implement cross-cutting functionality that benefits the entire organization. Sharing code libraries allows for incremental improvements, increased robustness, and standardization across projects. + +This guide will cover two main aspects of sharing code within teams using ZenML: + +1. What can be shared +2. How to distribute shared components + +## What Can Be Shared + +ZenML offers several types of custom components that can be shared between teams: + +### Custom Flavors + +Custom flavors are special integrations that don't come built-in with ZenML. These can be implemented and shared as follows: + +1. Create the custom flavor in a shared repository. +2. Implement the custom stack component as described in the [ZenML documentation](../stack-deployment/implement-a-custom-stack-component.md#implementing-a-custom-stack-component-flavor). +3. Register the component using the ZenML CLI, for example in the case of a custom artifact store flavor: + +```bash +zenml artifact-store flavor register +``` + +### Custom Steps + +Custom steps can be created and shared via a separate repository. Team members can reference these components as they would normally reference Python modules. + +### Custom Materializers + +Custom materializers are common components that teams often need to share. To implement and share a custom materializer: + +1. Create the materializer in a shared repository. +2. Implement the custom materializer as described in the [ZenML documentation](https://docs.zenml.io/how-to/handle-data-artifacts/handle-custom-data-types). +3. Team members can import and use the shared materializer in their projects. + +## How to Distribute Shared Components + +There are several methods to distribute and use shared components within a team: + +### Shared Private Wheels + +Using shared private wheels is an effective approach to sharing code within a team. This method packages Python code for internal distribution without making it publicly available. + +#### Benefits of Using Shared Private Wheels + +- Packaged format: Easy to install using pip +- Version management: Simplifies managing different code versions +- Dependency management: Automatically installs specified dependencies +- Privacy: Can be hosted on internal PyPI servers +- Smooth integration: Imported like any other Python package + +#### Setting Up Shared Private Wheels + +1. Create a private PyPI server or use a service like [AWS CodeArtifact](https://aws.amazon.com/codeartifact/). +2. [Build your code](https://packaging.python.org/en/latest/tutorials/packaging-projects/) [into wheel format](https://opensource.com/article/23/1/packaging-python-modules-wheels). +3. Upload the wheel to your private PyPI server. +4. Configure pip to use the private PyPI server in addition to the public one. +5. Install the private packages using pip, just like public packages. + +### Using Shared Libraries with `DockerSettings` + +When running pipelines with remote orchestrators, ZenML generates a `Dockerfile` at runtime. You can use the `DockerSettings` class to specify how to include your shared libraries in this Docker image. + +#### Installing Shared Libraries + +Here are some ways to include shared libraries using `DockerSettings`. Either specify a list of requirements: + +```python +import os +from zenml.config import DockerSettings +from zenml import pipeline + +docker_settings = DockerSettings( + requirements=["my-simple-package==0.1.0"], + environment={'PIP_EXTRA_INDEX_URL': f"https://{os.environ.get('PYPI_TOKEN', '')}@my-private-pypi-server.com/{os.environ.get('PYPI_USERNAME', '')}/"} +) + +@pipeline(settings={"docker": docker_settings}) +def my_pipeline(...): + ... +``` + +Or you can also use a requirements file: + +```python +docker_settings = DockerSettings(requirements="/path/to/requirements.txt") + +@pipeline(settings={"docker": docker_settings}) +def my_pipeline(...): + ... +``` + +The `requirements.txt` file would specify the private index URL in the following +way, for example: + +``` +--extra-index-url https://YOURTOKEN@my-private-pypi-server.com/YOURUSERNAME/ +my-simple-package==0.1.0 +``` + +For information on using private PyPI repositories to share your code, see our [documentation on how to use a private PyPI repository](../customize-docker-builds/how-to-use-a-private-pypi-repository.md). + +## Best Practices + +Regardless of what you're sharing or how you're distributing it, consider these best practices: + +- Use version control for shared code repositories. + +Version control systems like Git allow teams to collaborate on code effectively. They provide a central repository where all team members can access the latest version of the shared components and libraries. + +- Implement proper access controls for private PyPI servers or shared repositories. + +To ensure the security of proprietary code and libraries, it's crucial to set up appropriate access controls. This may involve using authentication mechanisms, managing user permissions, and regularly auditing access logs. + +- Maintain clear documentation for shared components and libraries. + +Comprehensive and up-to-date documentation is essential for the smooth usage and maintenance of shared code. It should cover installation instructions, API references, usage examples, and any specific guidelines or best practices. + +- Regularly update shared libraries and communicate changes to the team. + +As the project evolves, it's important to keep shared libraries updated with the latest bug fixes, performance improvements, and feature enhancements. Establish a process for regularly updating and communicating these changes to the team. + +- Consider setting up continuous integration for shared libraries to ensure quality and compatibility. + +Continuous integration (CI) helps maintain the stability and reliability of shared components. By automatically running tests and checks on each code change, CI can catch potential issues early and ensure compatibility across different environments and dependencies. + +By leveraging these methods for sharing code and libraries, teams can +collaborate more effectively, maintain consistency across projects, and +accelerate development processes within the ZenML framework. + + +
ZenML Scarf
diff --git a/docs/book/how-to/setting-up-a-project-repository/stacks-pipelines-models.md b/docs/book/how-to/setting-up-a-project-repository/stacks-pipelines-models.md new file mode 100644 index 00000000000..1c5e278906e --- /dev/null +++ b/docs/book/how-to/setting-up-a-project-repository/stacks-pipelines-models.md @@ -0,0 +1,105 @@ +--- +description: A guide on how to organize stacks, pipelines, models, and artifacts in ZenML. +--- + +# Organizing Stacks, Pipelines, Models, and Artifacts + +In ZenML, pipelines, stacks and models form a crucial part of your project's +architecture and how you choose to use them dictates how well organized your +code and workflow is. This section will give you an overview of how to think +about these concepts and how to best utilize them. + +Before we begin, here is a quick overview of the concepts we will be discussing: + +- **Stacks**: [Stacks](../../user-guide/production-guide/understand-stacks.md) represent the configuration of tools and infrastructure that your pipelines can run on. A stack is built of multiple stack components like an orchestrator, a container registry, an artifact store, etc. Each of these components deal with one part of your workflow and work together to run your pipeline. +- **Pipelines**: [Pipelines](../../user-guide/starter-guide/create-an-ml-pipeline.md) are a series of steps that each represent a specific task in your ML workflow and are executed in a sequence that ZenML determines from your pipeline definition. Pipelines help you automate many tasks, standardize your executions, and add visibility into what your code is doing. +- **Models**: [Models](../../how-to/use-the-model-control-plane/README.md) are entities that groups pipelines, artifacts, metadata, and other crucial business data together. You may think of a ZenML Model as a "project" or a "workspace" that spans multiple pipelines. +- **Artifacts**: [Artifacts](../../user-guide/starter-guide/manage-artifacts.md) are the output of a pipeline step that you want to track and reuse across multiple pipelines. + +Understanding the relationships between stacks, pipelines, models, and artifacts is crucial for effective MLOps with ZenML. + +## How many Stacks do I need? + +A stack provides the infrastructure and tools for running pipelines. Think of a stack as a representation of your execution environment in which your pipelines are run. This comprises both the hardware like the orchestration environment and any MLOps tools you use in your workflow. This way, Stacks allow you to seamlessly transition between different environments (e.g., local, staging, production) while keeping your pipeline code consistent. + +You can learn more about organizing and managing stacks in the [Managing Stacks and Components](../../how-to/stack-deployment/README.md) guide. + +You don't need a separate stack for each pipeline; instead, you can run multiple pipelines on the same stack. A stack is meant to be created once and then reused across multiple users and pipelines. This helps in the following ways: + +- reduces the overhead of configuring your infrastructure every time you run a pipeline. +- provides a consistent environment for your pipelines to run in, promoting reproducibility. +- reduces risk of errors when it comes to what hardware and tool configurations to use. + +## How do I organize my Pipelines, Models, and Artifacts? + +Pipelines, Models, and Artifacts form the core of your ML workflow in ZenML. All of your project logic is organized around these concepts and as such, it helps to understand how they interact with each other and how to structure your code to make the most out of them. + +### Pipelines + +A pipeline typically encompasses the entire ML workflow, including data +preparation, model training, and evaluation. It's a good practice to have a +separate pipeline for different tasks like training and inference. This makes +your pipelines more modular and easier to manage. Here's some of the benefits: + +- Separation of pipelines by the nature of the task allows you to [run them independently as needed](../develop-locally/local-prod-pipeline-variants.md). For example, you might train a model in a training pipeline only once a week but run inference on new data every day. +- It becomes easier to manage and update your code as your project grows more complex. +- Different people can work on the code for the pipelines without interfering with each other. +- It helps you organize your runs better. + +### Models + +Models are what tie related pipelines together. A Model in ZenML is a collection of data artifacts, model artifacts, pipelines and metadata that can all be tied to a specific project. +As such, it is good practice to use a Model to move data between pipelines. + +Continuing with the example of a training and an inference pipeline, you can use a ZenML Model to handover the trained model from the training pipeline to the inference pipeline. The Model Control Plane allows you to set Stages for specific model versions that can help with this. + +### Artifacts + +Artifacts are the output of a pipeline step that you want to track and reuse across multiple pipelines. They can be anything from a dataset to a trained model. It is a good practice to name your artifacts appropriately to make them easy to identify and reuse. Every pipeline run that results in a unique execution of a pipeline step produces a new version of your artifact. This ensures that there's a clear history and traceability of your data and model artifacts. + +Artifacts can be tied to a Model for better organization and visibility across pipelines. You can choose to log metadata about your artifacts which will then show up in the Model Control Plane. + +## So how do I put this all together? + +![Diagram showing how Models bring together Pipelines and Artifacts](../../.gitbook/assets/model_pipeline_artifact.png) + +Let's go through a real-world example to see how we can use Stacks, Pipelines, Models, and Artifacts together. Imagine there are two people in your team working on a classification model, Bob and Alice. + +Here's how the workflow would look like with ZenML: +- They create three pipelines: one for feature engineering, one for training the model, and one for producing predictions. +- They set up a [repository for their project](../setting-up-a-project-repository/README.md) and start building their pipelines collaboratively. Let's assume Bob builds the feature engineering and training pipeline and Alice builds the inference pipeline. +- To test their pipelines locally, they both have a `default` stack with a local orchestrator and a local artifact store. This allows them to quickly iterate on their code without deploying any infrastructure or incurring any costs. +- While building the inference pipeline, Alice needs to make sure that the preprocessing step in her pipeline is the same as the one used while training. It might even involve the use of libraries that are not publicily available and she follows the [Shared Libraries and Logic for Teams](./shared_components_for_teams.md) guide to help with this. +- Bob's training pipeline produces a model artifact, which Alice's inference pipeline requires as input. It also produces other artifacts such as metrics and a model checkpoint that are logged as artifacts in the pipeline run. +- To allow easy access to model and data artifacts, they [use a ZenML Model](../../how-to/use-the-model-control-plane/associate-a-pipeline-with-a-model.md) which ties the pipelines, models and artifacts together. Now Alice can just [reference the right model name and find the model artifact she needs.](../../how-to/use-the-model-control-plane/load-artifacts-from-model.md) +- It is also critical that the right model version from the training pipeline is used in the inference pipeline. The [Model Control Plane](../../how-to/use-the-model-control-plane/README.md) helps Bob to keep track of the different versions and to easily compare them. Bob can then [promote the best performing model version to the `production` stage](../../how-to/use-the-model-control-plane/promote-a-model.md) which Alice's pipeline can then consume. +- Alice's inference pipeline produces a new artifact, in this case a new dataset containing the predictions of the model. Results can also be added as metadata to the model version, allowing easy comparisons. + +This is a very simple example, but it shows how you can use ZenML to structure your ML workflow. You can use the same principles for more complex workflows. + +## Rules of Thumb + +Here are some general guidelines to help you organize your ZenML projects effectively: + +### Models +- Create one Model per distinct machine learning use-case or business problem +- Use Models to group related pipelines, artifacts, and metadata together +- Leverage the Model Control Plane to manage model versions and stages (e.g., staging, production) + +### Stacks +- Maintain separate stacks for different environments (development, staging, production) +- Share production and staging stacks across teams to ensure consistency +- Keep local development stacks simple for quick iterations + +### Naming and Organization +- Use consistent naming conventions for pipelines, artifacts, and models +- Leverage tags to organize and filter resources (e.g., `environment:production`, `team:fraud-detection`) +- Document stack configurations and pipeline dependencies +- Keep pipeline code modular and reusable across different environments + +Following these guidelines will help maintain a clean and scalable MLOps workflow as your project grows. + + +
ZenML Scarf
+ + diff --git a/docs/book/how-to/setting-up-a-project-repository/using-project-templates.md b/docs/book/how-to/setting-up-a-project-repository/using-project-templates.md index 7db70c608cd..4b5e7a0e525 100644 --- a/docs/book/how-to/setting-up-a-project-repository/using-project-templates.md +++ b/docs/book/how-to/setting-up-a-project-repository/using-project-templates.md @@ -14,7 +14,7 @@ What would you need to get a quick understanding of the ZenML framework and star Do you have a personal project powered by ZenML that you would like to see here? At ZenML, we are looking for design partnerships and collaboration to help us better understand the real-world scenarios in which MLOps is being used and to build the best possible experience for our users. If you are interested in sharing all or parts of your project with us in the form of a ZenML project template, please [join our Slack](https://zenml.io/slack/) and leave us a message! {% endhint %} -## Generating project from a project template +## Using a project template First, to use the templates, you need to have ZenML and its `templates` extras installed: @@ -22,6 +22,13 @@ First, to use the templates, you need to have ZenML and its `templates` extras i pip install zenml[templates] ``` +{% hint style="warning" %} +Note that these templates are not the same thing as the templates used for +triggering a pipeline (from the dashboard or via the Python SDK). Those are +known as 'Run Templates' and you can read more about them here. +{% endhint %} + Now, you can generate a project from one of the existing templates by using the `--template` flag with the `zenml init` command: ```bash @@ -36,47 +43,5 @@ zenml init --template --template-with-defaults # example: zenml init --template e2e_batch --template-with-defaults ``` -## Creating your own ZenML template - -Creating your own ZenML template is a great way to standardize and share your ML workflows across different projects or teams. ZenML uses [Copier](https://copier.readthedocs.io/en/stable/) to manage its project templates. Copier is a library that allows you to generate projects from templates. It's simple, versatile, and powerful. - -Here's a step-by-step guide on how to create your own ZenML template: - -1. **Create a new repository for your template.** This will be the place where you store all the code and configuration files for your template. -2. **Define your ML workflows as ZenML steps and pipelines.** You can start by copying the code from one of the existing ZenML templates (like the [starter template](https://github.com/zenml-io/template-starter)) and modifying it to fit your needs. -3. **Create a `copier.yml` file.** This file is used by Copier to define the template's parameters and their default values. You can learn more about this config file [in the copier docs](https://copier.readthedocs.io/en/stable/creating/). -4. **Test your template.** You can use the `copier` command-line tool to generate a new project from your template and check if everything works as expected: - -```bash -copier copy https://github.com/your-username/your-template.git your-project -``` - -Replace `https://github.com/your-username/your-template.git` with the URL of your template repository, and `your-project` with the name of the new project you want to create. - -5. **Use your template with ZenML.** Once your template is ready, you can use it with the `zenml init` command: - -```bash -zenml init --template https://github.com/your-username/your-template.git -``` - -Replace `https://github.com/your-username/your-template.git` with the URL of your template repository. - -If you want to use a specific version of your template, you can use the `--template-tag` option to specify the git tag of the version you want to use: - -```bash -zenml init --template https://github.com/your-username/your-template.git --template-tag v1.0.0 -``` - -Replace `v1.0.0` with the git tag of the version you want to use. - -That's it! Now you have your own ZenML project template that you can use to quickly set up new ML projects. Remember to keep your template up-to-date with the latest best practices and changes in your ML workflows. - -Our [Production Guide](../../user-guide/production-guide/README.md) documentation is built around the `E2E Batch` project template codes. Most examples will be based on it, so we highly recommend you to install the `e2e_batch` template with `--template-with-defaults` flag before diving deeper into this documentation section, so you can follow this guide along using your own local environment. - -```bash -mkdir e2e_batch -cd e2e_batch -zenml init --template e2e_batch --template-with-defaults -``` - -
ZenML Scarf
+ +
ZenML Scarf
diff --git a/docs/book/how-to/training-with-gpus/accelerate-distributed-training.md b/docs/book/how-to/training-with-gpus/accelerate-distributed-training.md index 3177b5cdc18..4047e781413 100644 --- a/docs/book/how-to/training-with-gpus/accelerate-distributed-training.md +++ b/docs/book/how-to/training-with-gpus/accelerate-distributed-training.md @@ -49,7 +49,7 @@ The `run_with_accelerate` decorator accepts various arguments to configure your 3. If `run_with_accelerate` is misused, it will raise a `RuntimeError` with a helpful message explaining the correct usage. {% hint style="info" %} -To see a full example where Accelerate is used within a ZenML pipeline, check out our llm-lora-finetuning project which leverages the distributed training functionalities while finetuning an LLM. +To see a full example where Accelerate is used within a ZenML pipeline, check out our [llm-lora-finetuning](https://github.com/zenml-io/zenml-projects/blob/main/llm-lora-finetuning/README.md) project which leverages the distributed training functionalities while finetuning an LLM. {% endhint %} ## Ensure your container is Accelerate-ready @@ -111,4 +111,4 @@ If you're new to distributed training or encountering issues, please [connect wi By leveraging the Accelerate integration in ZenML, you can easily scale your training processes and make the most of your available hardware resources, all while maintaining the structure and benefits of your ZenML pipelines. -
ZenML Scarf
\ No newline at end of file +
ZenML Scarf
diff --git a/docs/book/toc.md b/docs/book/toc.md index 1b1d7ffa8f8..5652df9e0de 100644 --- a/docs/book/toc.md +++ b/docs/book/toc.md @@ -63,10 +63,14 @@ ## How-To -* [😸 Set up a project repository](how-to/setting-up-a-project-repository/README.md) +* [😸 Set up a ZenML project](how-to/setting-up-a-project-repository/README.md) + * [Set up a repository](how-to/setting-up-a-project-repository/set-up-repository.md) * [Connect your git repository](how-to/setting-up-a-project-repository/connect-your-git-repository.md) * [Project templates](how-to/setting-up-a-project-repository/using-project-templates.md) - * [Best practices](how-to/setting-up-a-project-repository/best-practices.md) + * [Create your own template](how-to/setting-up-a-project-repository/create-your-own-template.md) + * [Shared components for teams](how-to/setting-up-a-project-repository/shared-components-for-teams.md) + * [Stacks, pipelines and models](how-to/setting-up-a-project-repository/stacks-pipelines-models.md) + * [Access management](how-to/setting-up-a-project-repository/access-management.md) * [⛓️ Build a pipeline](how-to/build-pipelines/README.md) * [Use pipeline/step parameters](how-to/build-pipelines/use-pipeline-step-parameters.md) * [Configuring a pipeline at runtime](how-to/build-pipelines/configuring-a-pipeline-at-runtime.md) @@ -104,6 +108,7 @@ * [Docker settings on a step](how-to/customize-docker-builds/docker-settings-on-a-step.md) * [Use a prebuilt image for pipeline execution](how-to/customize-docker-builds/use-a-prebuilt-image.md) * [Specify pip dependencies and apt packages](how-to/customize-docker-builds/specify-pip-dependencies-and-apt-packages.md) + * [How to use a private PyPI repository](how-to/customize-docker-builds/how-to-use-a-private-pypi-repository.md) * [Use your own Dockerfiles](how-to/customize-docker-builds/use-your-own-docker-files.md) * [Which files are built into the image](how-to/customize-docker-builds/which-files-are-built-into-the-image.md) * [How to reuse builds](how-to/customize-docker-builds/how-to-reuse-builds.md) diff --git a/docs/book/user-guide/llmops-guide/rag/storing-embeddings-in-a-vector-database.md b/docs/book/user-guide/llmops-guide/rag/storing-embeddings-in-a-vector-database.md index bdeea29f4d5..2b169636080 100644 --- a/docs/book/user-guide/llmops-guide/rag/storing-embeddings-in-a-vector-database.md +++ b/docs/book/user-guide/llmops-guide/rag/storing-embeddings-in-a-vector-database.md @@ -22,7 +22,7 @@ options. {% hint style="info" %} For more information on how to set up a PostgreSQL database to follow along with -this guide, please see the instructions in the repository which show how to set +this guide, please [see the instructions in the repository](https://github.com/zenml-io/zenml-projects/tree/main/llm-complete-guide) which show how to set up a PostgreSQL database using Supabase. {% endhint %}